What is best practice to use shrink SQL Server? - sql-server

I have read a lot that shrinking database is not recommended practice as it causes a fragmentation that leads to slower performance.
ref :
https://www.brentozar.com/archive/2017/12/whats-bad-shrinking-databases-dbcc-shrinkdatabase/
https://straightpathsql.com/archives/2009/01/dont-touch-that-shrink-button/
But, it appeared to be the case for data file, if the log is full, shrinking log should not be a problem right?
if the data file is huge that takes a lot of space, I do need more space to insert and update some new data, shrinking apparently reduce the size of the file on the drive, which I assumed that I could use the free space to insert new data. but if shrinking is not recommended, how do I resolve this? and when is the best case to use shrink

if the data file is huge that takes a lot of space, I do need more
space to insert and update some new data, shrinking apparently reduce
the size of the file on the drive, which I assumed that I could use
the free space to insert new data.
If your data file takes a lot of space it does not mean that this space is empty.
You should use sp_spaceused to determine if there is unused space within data file.
If there is unused space, it will be already used "to insert and update some new data", and if there isn't doing shrink will change nothing: shrink does not delete your data, all it does is moving data at the beginning of the file to make space at the end in order to give it back to OS.
Shrinking data file can be usefull when you had a data file of 2Tb and 1 Tb of data was deleted and you don't plan to insert another Tb of data in next 10 years.
You can imagine your data file as a box 1m x 1m x 1m. If you have only a half of the box full of toys, even if you don't use shrink you can put other toys into this box (make insert/update). What instead shrink does, it gathers all the toys in 1 corner and then cut your box in order to make it 50cm x 50cm x 50cm. This way your room (OS) now has more free space because your toyb box takes only the half of space it took prior to shrink.
...And if your box was already full, you cannot add more toys even if you try to do shrink.
if the log is full, shrinking log should not be a problem right?
Shrinkig log is another process, nothing can be moved inside log file, in this sense of course shrink cannot make much harm as in the case of data file: it does not require server recourses, it does not cause any fragmentation, etc.
But if it succeeds or not depends on the cause of your "log is full".
If your log is full due to full model, shrinking log file will not change anything: the log is retained to give you the possibility of having the log backup chain (or to make possible mirroring, or log shipping, etc).
If instead your database recovery model is simple, and there was some trouble with a transaction that was open for long period of time, or there was a huge data loading (maybe with full logging such as insert into without tablock) and your log file became bigger than data file, and you found and fixed the problem and you don't need such a huge log file, yes you can shrink it to a reasonable size, and it's not harmful.

Related

Appending + Removing Postgres Text Array elements. Resulting in massive table sizes

UPDATE db.table SET logs = array_prepend('some things happened', logs[0:1000]) WHERE id = 'foo';
This query simply prepends text to a text array, and removes elements from the array and limits the array to 1,000 elements. It works, but the table size on the disk rapidly swells to multiple GB (The table should only be around 150MB). Am I doing something wrong? Is this a bug in PostgreSQL? I'm running PostgreSQL 11.9. If I don't run a full vacuum, PostgreSQL will eventually use up all available disk space.
This query is for a turn-based game, and it stores logs about what's happening to the player for debugging purposes.
This is expected behavior. The space is only cleared by vacuum/autovacuum. However there's actually not a huge cost to having that used space around, as Postgres will reuse it if it runs short on disk space.
Part of the issue is that modifying a column value requires rewriting the entire row (or in this case, your column is probably getting TOASTed, so rewriting the pointer to the TOAST table and writing a new value in the TOAST table), so each update you do rewrites everything you have stored. For large values this adds up quickly.
If you're really worried about it I think normalizing this might be a good choice, or you could switch to storing this data in something better designed for append only data. Or you could use an FDW designed for storing append only data like this outside the normal storage mechanisms, usually as a file on disk.

Why does SQLite store hundreds of null bytes?

In a database I'm creating, I was curious why the size was so much larger than the contents, and checked out the hex code. In a 4 kB file (single row as a test), there are two major chunks that are roughly 900 and 1000 bytes, along with a couple smaller ones that are all null bytes 0x0
I can't think of any logical reason it would be advantageous to store thousands of null bytes, increasing the size of the database significantly.
Can someone explain this to me? I've tried searching, and haven't been able to find anything.
The structure of a SQLite database file (`*.sqlite) is described in this page:
https://www.sqlite.org/fileformat.html
SQLite files are partitioned into "pages" which are between 512 and 65536 bytes long - in your case I imagine the page size is probably 1KiB. If you're storing data that's smaller than 1KiB (as you are with your single test row, which I imagine is maybe 100 bytes long?) then that leaves 900 bytes left - and unused (deallocated) space is usually zeroed-out before (and after) use.
It's the same way computer working memory (RAM) works - as RAM also uses paging.
I imagine you expected the file to be very compact with a terse internal representation; this is the case with some file formats - such as old-school OLE-based Office documents but others (and especially database files) require a different file layout that is optimized simultaneously for quick access, quick insertion of new data, and is also arranged to help prevent internal fragmentation - this comes at the cost of some wasted space.
A quick thought-experiment will demonstrate why mutable (i.e. non-read-only) databases cannot use a compact internal file structure:
Think of a single database table as being like a CSV file (and CSVs themselves are compact enough with very little wasted space).
You can INSERT new rows by appending to the end of the file.
You can DELETE an existing row by simply overwriting the row's space in the file with zeroes. Note that you cannot actually "delete" the space by "moving" data (like using the Backspace key in Notepad) because that means copying all of the data in the file around - this is largely a bad idea.
You can UPDATE a row by checking to see if the new row's width will fit in the current space (and overwrite the remaining space with zeros), or if not, then append a new row at the end and overwrite the existing row (a-la INSERT then DELETE)
But what if you have two database tables (with different columns) and need to store them in the same file? One approach is to simply mix each table's rows in the same flat file - but for other reasons that's a bad idea. So instead, inside your entire *.sqlite file, you create "sub-files", that have a known, fixed size (e.g. 4KiB) that store only rows for a single table until the sub-file is full; they also store a pointer (like a linked-list) to the next sub-file that contains the rest of the data, if any. Then you simply create new sub-files as you need more space inside the file and set-up their next-file pointers. These sub-files are what a "page" is in a database file, and is how you can have multiple read/write database tables contained within the same parent filesystem file.
Then in addition to these pages to store table data, you also need to store the indexes (which is what allows you to locate a table row near-instantly without needing to scan the entire table or file) and other metadata, such as the column-definitions themselves - and often they're stored in pages too. Relational (tabular) database files can be considered filesystems in their own right (just encapsulated in a parent filesystem... which could be inside a *.vhd file... which could be buried inside a varbinary database column... inside another filesystem), and even the database systems themselves have been compared to operating-systems (as they offer an environment for programs (stored procedures) to run, they offer IO services, and so on - it's almost circular if you look at the old COBOL-based mainframes from the 1970s when all of your IO operations were restricted to just computer record management operations (insert, update, delete).

sqlite db with less records consumes more space than another db with more records(same structure) [duplicate]

I have an SQLite database.
I created the tables and filled them with a considerable amount of data.
Then I cleared the database by deleting and recreating the tables. I confirmed that all the data had been removed and the tables were empty by looking at them using SQLite Administrator.
The problem is that the size of the database file (*.db3) remained the same after it had been cleared.
This is of course not desirable as I would like to regain the space that was taken up by the data once I clear it.
Did anyone make a similar observation and/or know what is going on?
What can be done about it?
From here:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.
Databases sizes work like water marks e.g. if the water rises the water mark goes up, when the water receeds the water mark stays where it was
You should look into shrinking databases

sqlite3 when disk storage reached

I have a 2M bytes storage to store some logs in our embedded device (linux base). As the size is very limited, we have to implement some approach to handle the case that max size is reached. One option is circular buffer with mmap for persistence. The other option we are thinking is to use sqlite3 (when max size is reached, delete oldest entries, insert new ones).
However, as far as I understand, sqlite3 uses pages (limit 4096K or configurable). My questions are:
how to calculate disk usage from sqlite3? besides the database file size, what is also needed to count here?
what happens when 2M is reached? is there any particular info or error I could check to delete oldest entries?
is it a good approach (performance wise, data segmentation wise) to delete entries, then insert new ones?
Any suggestions or feedbacks are welcome.
It is not possible to calculated the disk usage; you have to monitor the file. Besides the actual database file, there is also the rollback journal, whose size corresponds to the amount of changed data in a transaction.
When the disk is full, you get an error code of SQLITE_FULL (or maybe SQLITE_IOERR_WRITE, depending on the OS).
You can limit the database size with PRAGMA max_page_count.
Deleted rows result in more free space in that particular database page. (This never changed the file size, unless you run VACUUM.)
When inserting new rows at the other end of the table, the space can get reused only when the entire page was freed because all its rows were deleted.
So you should try to delete rows in large chunks, if possible.

Why is my Firebird database so huge for the amount of data it's holding?

I've been playing around with database programming lately, and I noticed something a little bit alarming.
I took a binary flat file saved in a proprietary, non-compressed format that holds several different types of records, built schemas to represent the same records, and uploaded the data into a Firebird database. The original flat file was about 7 MB. The database is over 70 MB!
I can understand that there's some overhead to describe the tables themselves, and I've got a few minimal indices (mostly PKs) and FKs on various tables, and all that is going to take up some space, but a factor of 10 just seems a little bit ridiculous. Does anyone have any ideas as to what could be bloating up this database so badly, and how I could bring the size down?
From Firebird FAQ:
Many users wonder why they don't get their disk space back when they delete a lot of records from database.
The reason is that it is an expensive operation, it would require a lot of disk writes and memory - just like doing refragmentation of hard disk partition. The parts of database (pages) that were used by such data are marked as empty and Firebird will reuse them next time it needs to write new data.
If disk space is critical for you, you can get the space back by doing backup and then restore. Since you're doing the backup to restore right away, it's wise to use the "inhibit garbage collection" or "don't use garbage collection" switch (-G in isql), which will make backup go A LOT FASTER. Garbage collection is used to clean up your database, and as it is a maintenance task, it's often done together with backup (as backup has to go throught entire database anyway). However, you're soon going to ditch that database file, and there's no need to clean it up.
Gstat is the tool to examine table sizes etc, maybe it will give you some hints what's using space.
In addition, you may also have multiple snapshots or other garbage in database file, it depends on how you add data to the database. The database file never shrinks automatically, but backup/restore cycle gets rid of junk and empty space.
Firebird fill pages in some factor not full.
e.g. db page can contain 70% of data and 30% free space to speed up future record updates, deletes without moving to new db page.
CONFIGREVISIONSTORE (213)
Primary pointer page: 572, Index root page: 573
Data pages: 2122, data page slots: 2122, average fill: 82%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 79
80 - 99% = 2042
The same is for indexes.
You can see how really db size is when you do backup and restore with option
-USE_ALL_SPACE
then database will be restored without this space preservation.
You must know also that not only pages with data are allocated but also some pages are preallocated (empty) for future fast use without expensive disc allocation and fragmentation.
as "Peter G." say - database is much more then flat file and is optimized to speed up thinks.
and as "Harriv" say - you can get details about database file with gstat
use command like gstat -
here are details about its output

Resources