Is physical file fragmentation of the mdf file an issue? - sql-server

I have ran a defrag report on a SQL server and the mdf file for the database is being reported as 99% fragmented. I asked a colleague and they said this may be normal as it just the way that type of file works internally.
Is it an issue and would running a disk defrag for this file improve DB performance? It is a pain to because it requires a lot of space on disk as it is a huge file so I want to be sure whether it is necessary.
Update
Another linked question leading on from the answers. How would a RAID striped configuration affect this. If the data is striped over multiple disks would defragmentation be less of an issue?

As long as you have determined that, in fact, you have a database response problem this would be a good candidate for improvement. Presumably you've at least determined that there is a problem to be solved, and that it's in the database. (i.e. you've tested queries which are slower than you would like; or you see locks; etc.).
Otherwise it starts to look like guessing and trying. Caching naturally compensates for a lot of this depending on your query patterns.

Short answer is yes a defrag would help.
Clustered indexes try to organize the index such that only sequential reads will be required. This is great if the .MDF is contiguous on disk. Not so much if the file is spread out all over the disk.
Also even though SQL allocates in 8K blocks having to move the disk head all over the place leads to much slower access times. You really want the file contiguous.

A good way to avoid this is to not rely on autogrow. Give the MDF room to breath.

Related

what are the effects and side-effects of database file shrink sql server

I'm having a database on Sql server 2012 which size is 5 Gb. and getting larger daily.
I need to shrink that database.
My question is:-
Is it a good sign of shrinking the database weekly or monthly one. as running out of space. Are there any side effects like decreasing the performance.
Generally, shrinking is bad because it takes up a lot of resources and cost performance, then you'd likely need to take a look at statistics and fragmentation of indexes at the same time.
Also because a database rarely really can be made "smaller" (more data doesn't take up less space), and because if it has grown to a specific size, it's because the data takes up that amount of space.
So basically what you need to look at is why the database grows. Is it unintended growth? Is it data files which grow or log files?
If data files - do you store more data then you need?
If log files - how is your backup procedure and how do you handle the log file in that?
Shrinking files usually is treating a symptom that something is wrong, more than treating what is actually wrong.
So it is definitely not a good sign if it grows 'unexpected' or 'too much', and trying to find the cause would be the better route.
(Of course, real life scenarios exists where you sometimes do have to make the 'bad' choice, but well - this was just generally speaking :))
The only time a log file should be shrunk when there is some abnormal activities like increasing its size automatically. it is bad to shrink a log file regularly, see here to know why

TempDB performance crawling; should we reboot?

A bit of background: We have 17 different TempDB database files and 6 TempDB log files on the server. These are spread out on different drives, but are hosted on 2 drive arrays.
I’m seeing Disk IO response times exceeding the recommended limits. Typically you want your disks to respond in 5-10ms, with nothing going over 200ms. We’re seeing random spikes up to 800ms on the TempDB files, but only on one drive array.
Proposed solution: Restart SQL server. While SQL server is shut down, reboot the drive array hosting the majority of the TempDB files. In addition, while SQL is down, redo the network connection to bypass the network switch in an attempt to eliminate any source of slowness on the hardware.
Is this a good idea or a shot in the dark? Any ideas?
Thanks in advance.
17? Who came up with that number? Please read this and this - very few scenarios where > 8 files will help, particularly if you only have 2 underlying arrays/controllers. Some suggestions:
Use an even number of files. Most folks start with 4 or 8, and increase beyond that only when they've proven that they still have contention (and also know that their underlying I/O can actually handle more files and scale with them; in some cases it will have no effect or the exact opposite effect - a different drive letter does not necessarily mean better I/O pathing).
Make sure all of the data files are sized the same, and have identical autogrow settings. Having 17 files with different sizes and autogrowth settings will defeat round robin - in a lot of cases only one file will be used due to the way SQL Server performs proportional fill. And having an odd number just seems... well, odd to me.
Get rid of your 5 extra log files. They are absolutely useless.
Use trace flag 1117 to make sure that all the data files grow at the same time and (because of 2.) at the same rate. Note though that this trace flag applies to all databases, not just tempdb. More info here.
You can also consider trace flag 1118 to change allocation, but please read this first.
Make sure instant file initialization is on, so that the file doesn't have to be zeroed out when it expands.
Pre-size your tempdb files so that they don't have to grow during normal day-to-day activity. Do not shrink tempdb files because they suddenly got big - this is just a rinse and repeat operation, since if they got that big once, they'll get that big again. It's not like you can lease out the recovered space in the meantime.
When possible, perform DBCC CHECKDB elsewhere. If you're running CHECKDB regularly, yay! Pat yourself on the back. However this can take a toll on tempdb - please see this article on optimizing this operation and pulling it away from your production instance where feasible.
Finally, validate what type of contention you're seeing. You say that tempdb performance crawls, but in what way? How are you measuring this? Some info on determining the exact nature of tempdb bottlenecks here and here and here and here and here.
Have you considered making less use of tempdb explicitly (fewer #temp tables, #table variables, and static cursors - or cursors altogether)? Are you making heavy use of RCSI, or MARS, or LOB-type local variables?

PostgreSQL on RamDisk: Size of work_mem etc.?

I am experimenting with running PostgreSQL on a ramdisk on windows. The way I did it was to simply place the data directory on the ramdisk.
Without having done any specific benchmarks, the performance seems to be magnificent and only CPU bound. My question is what the optimal values for things like work_mem, shared_buffers etc. would be?
Even when the database is in ram it take more than half a minute to run many of my queries. Therefore, I wonder whether it makes sence to create indexes on the table. The indexes would, of course, need to stay in ram. I should mention that I am using PostgreSQL for a data warehouse (small one, though).
Edit: I should mention that I am using the RamDisk utility from DataRam.com. It only gives me a bluescreen once in a while, when I configure the ramdisk, never when it is established. I think of this as nostalgic eyecandy. ;)
I would definitely create indexes. The engine can use the info contained for all sorts of optimizations, and it should improve your performance quite a bit. RamDisk solves the worst case table scan type cases, but it doesn't necessarily mean that that tablescan is faster than doing a correct lookup.
Yes, use indexes but work_mem still depends on the size of the machine itself. Of course one of the reasons you want a high work_mem is so you don't hit disk to do the tape sort. On a ramdisk this isn't nearly as dangerous. Just remember, you have no data integrity on a ram disk.

How much faster is a database running in RAM?

I"m looking to run PostgreSQL in RAM for performance enhancement. The database isn't more than 1GB and shouldn't ever grow to more than 5GB. Is it worth doing? Are there any benchmarks out there? Is it buggy?
My second major concern is: How easy is it to back things up when it's running purely in RAM. Is this just like using RAM as tier 1 HD, or is it much more complicated?
It might be worth it if your database is I/O bound. If it's CPU-bound, a RAM drive will make no difference.
But first things first, you should make sure that your database is properly tuned, you can get huge performance gains that way without losing any guarantees. Even a RAM-based database will perform badly if it's not properly tuned. See PostgreSQL wiki on this, mainly shared_buffers, effective_cache_size, checkpoint_*, default_statistics_target
Second, if you want to avoid synchronizing disk buffers on every commit (like codeka explained in his comment), disable the synchronous_commit configuration option. When your machine loses power, this will lose some latest transactions, but your database will still be 100% consistent. In this mode, RAM will be used to buffer all writes, including writes to the transaction log. So with very rare checkpoints, large shared_buffers and wal_buffers, it can actually approach speeds close to those of a RAM-drive.
Also hardware can make a huge difference. 15000 RPM disks can, in practice, be 3x as fast as cheap drives for database workloads. RAID controllers with battery-backed cache also make a significant difference.
If that's still not enough, then it may make sense to consider turning to volatile storage.
The whole thing about whether to hold you database in memory depends on size and performance as well how robust you want it to be with writes. I assume you are writing to your database and that you want to persist the data in case of failure.
Personally, I would not worry about this optimization until I ran into performance issues. It just seems risky to me.
If you are doing a lot of reads and very few writes a cache might serve your purpose, Many ORMs come with one or more caching mechanisms.
From a performance point of view, clustering across a network to another DBMS that does all the disk writing, seems a lot more inefficient than just having a regular DBMS and having it tuned to keep as much as possible in RAM as you want.
Actually... as long as you have enough memory available your database will already be fully running in RAM. Your filesystem will completely buffer all the data so it won't make much of a difference.
But... there is ofcourse always a bit of overhead so you can still try and run it all from a ramdrive.
As for the backups, that's just like any other database. You could use the normal Postgres dump utilities to backup the system. Or even better, let it replicate to another server as a backup.
5 to 40 times faster than disk resident DBMS. Check out Gartner's Magic Quadrant for Operational DBMSs 2013.
Gartner shows who is strong and more importantly notes severe cautions...bugs. .errors...lack of support and hard to use of vendors.

What are the performance characteristics of sqlite with very large database files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
2020 update, about 11 years after the question was posted and later closed, preventing newer answers.
Almost everything written here is obsolete. Once upon a time sqlite was limited to the memory capacity or to 2 GB of storage (32 bits) or other popular numbers... well, that was a long time ago.
Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.
There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).
Original:
I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).
However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.
I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards.
Anyone have any experience with this? Any tips/ideas?
So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).
The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.
The idea was to insert enough data until sqlite files were about 50GB.
Single Table
I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.
This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.
I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.
Multiple Tables
I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.
This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.
Vacuum Issues
As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.
However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.
Conclusions
For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.
This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).
I'd probably have to monitor table size per file as well to see when the speed will become a problem.
It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.
We are using DBS of 50 GB+ on our platform. no complains works great.
Make sure you are doing everything right! Are you using predefined statements ?
*SQLITE 3.7.3
Transactions
Pre made statements
Apply these settings (right after you create the DB)
PRAGMA main.page_size = 4096;
PRAGMA main.cache_size=10000;
PRAGMA main.locking_mode=EXCLUSIVE;
PRAGMA main.synchronous=NORMAL;
PRAGMA main.journal_mode=WAL;
PRAGMA main.cache_size=5000;
Hope this will help others, works great here
I've created SQLite databases up to 3.5GB in size with no noticeable performance issues. If I remember correctly, I think SQLite2 might have had some lower limits, but I don't think SQLite3 has any such issues.
According to the SQLite Limits page, the maximum size of each database page is 32K. And the maximum pages in a database is 1024^3. So by my math that comes out to 32 terabytes as the maximum size. I think you'll hit your file system's limits before hitting SQLite's!
Much of the reason that it took > 48 hours to do your inserts is because of your indexes. It is incredibly faster to:
1 - Drop all indexes
2 - Do all inserts
3 - Create indexes again
Besides the usual recommendation:
Drop index for bulk insert.
Batch inserts/updates in large transactions.
Tune your buffer cache/disable journal /w PRAGMAs.
Use a 64bit machine (to be able to use lots of cache™).
[added July 2014] Use common table expression (CTE) instead of running multiple SQL queries! Requires SQLite release 3.8.3.
I have learnt the following from my experience with SQLite3:
For maximum insert speed, don't use schema with any column constraint. (Alter table later as needed You can't add constraints with ALTER TABLE).
Optimize your schema to store what you need. Sometimes this means breaking down tables and/or even compressing/transforming your data before inserting to the database. A great example is to storing IP addresses as (long) integers.
One table per db file - to minimize lock contention. (Use ATTACH DATABASE if you want to have a single connection object.
SQLite can store different types of data in the same column (dynamic typing), use that to your advantage.
Question/comment welcome. ;-)
I have a 7GB SQLite database.
To perform a particular query with an inner join takes 2.6s
In order to speed this up I tried adding indexes. Depending on which index(es) I added, sometimes the query went down to 0.1s and sometimes it went UP to as much as 7s.
I think the problem in my case was that if a column is highly duplicate then adding an index degrades performance :(
There used to be a statement in the SQLite documentation that the practical size limit of a database file was a few dozen GB:s. That was mostly due to the need for SQLite to "allocate a bitmap of dirty pages" whenever you started a transaction. Thus 256 byte of RAM were required for each MB in the database. Inserting into a 50 GB DB-file would require a hefty (2^8)*(2^10)=2^18=256 MB of RAM.
But as of recent versions of SQLite, this is no longer needed. Read more here.
I think the main complaints about sqlite scaling is:
Single process write.
No mirroring.
No replication.
I've experienced problems with large sqlite files when using the vacuum command.
I haven't tried the auto_vacuum feature yet. If you expect to be updating and deleting data often then this is worth looking at.

Resources