I have a TEMPDB database with 8 files but, they have different sizes which is not recommended, as follow:
TempDB Files
I have a plan to resize to the same size as recommended with 20GB each, the TEMPDB total will be 160GB. My question is, if the SQL Server is executing an operation which need 23GB, will the remaining 3GB be split to another file or the files will grow to accommodate the operation in just 1 tempdb file?
If the files grow, instead of 160GB I will end with 184GB just because of 3 GB..
My question is, if the SQL Server is executing an operation which need
23GB, will the remaining 3GB be split to another file or the files
will grow to accommodate the operation in just 1 tempdb file?
SQL Server uses proportional fill algorithm to fill up data files, this means it will try to spread the data across all your files depending on the amount of free space in those files: the file with most free space will receive most of the data.
If all your files will be of equal size, and you have 9 file, then 23Gb will be evenly distributed among these 9 files, about 2,5Gb per each file.
Related
Oracle allow to create table space with multiple data files. what is the different with one data file size 1TB and 2 data files 500GB each? Is it any performance gain?
Performance? Could be. If you have one large (or two smaller) datafiles on the same hard disk, that will probably run somewhat slower than having two smaller datafiles on diferent hard disks. You know, you & me accessing data at the same time. HDD head will have to "jump" from one place to another to send data to both of us. If those were two disks, there's a chance that each disk will provide data separately and that would be faster.
My code does the following
do 100 times of
open a new file; write 10M data; close it
open the 100 files together, read and merge their data into a larger file
do steps 1 and 2 many times in a loop
I was wondering if I can keep the 100 open w/o opening and closing them too many times. What I can do is fopen them with w+. After writing I set position the beginning to read, after read I set position to the beginning to write, and so on.
The questions are:
if I read after write w/o closing, do we always read all the written data
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
Bases on the comments and discussion I will talk about why I need to do this in my work. It is also related to my other post
how to convert large row-based tables into column-based tables efficently
I have a calculation that generates a stream of results. So far the results are saved in a row-storage table. This table has 1M columns, each column could be 10M long. Actually each column is one attribute the calculation produces. At the calculation runs, I dump and append the intermediate results the table. The intermediate results could be 2 or 3 double values at each column. I wanted to dump it soon because it already consumes >16M memory. And the calculate needs more memoy. This ends up a table like the following
aabbcc...zzaabbcc..zz.........aabb...zz
A row of data are stored together. The problem happens when I want to analyze the data column by column. So I have to read 16 bytes then seek to the next row for reading 16 bytes then keep on going. There are too many seeks, it is much slower than if all columns are stored together so I can read them sequentially.
I can make the calculation dump less frequent. But to make the late read more efficent. I may want to have 4K data stored together since I assume each fread gets 4K by default even if I read only 16bytes. But this means I need to buffer 1M*4k = 4G in memory...
So I was thinking if I can merge fragment datas into larger chunks like that the post says
how to convert large row-based tables into column-based tables efficently
So I wanted to use files as offline buffers. I may need 256 files to get a 4K contiguous data after merge if each file contains 1M of 2 doubles. This work can be done as an asynchronous way in terms of the main calculation. But I wanted to ensure the merge overhead is small so when it runs in parallel it can finish before the main calculation is done. So I came up with this question.
I guess this is very related to how column based data base is constructed. When people create them, do they have the similar issues? Is there any description of how it works on creation?
You can use w+ as long as the maximum number of open files on your system allows it; this is usually 255 or 1024, and can be set (e.g. on Unix by ulimit).
But I'm not too sure this will be worth the effort.
On the other hand, 100 files of 10M each is one gigabyte; you might want to experiment with a RAM disk. Or with a large file system cache.
I suspect that huger savings might be reaped by analyzing your specific problem structure. Why is it 100 files? Why 10 M? What kind of "merge" are you doing? Are those 100 files always accessed in the same order and with the same frequency? Could some data be kept in RAM and never be written at all?
Update
So, you have several large buffers like,
ABCDEFG...
ABCDEFG...
ABCDEFG...
and you want to pivot them so they read
AAA...
BBB...
CCC...
If you already have the total size (i.e., you know that you are going to write 10 GB of data), you can do this with two files, pre-allocating the file and using fseek() to write to the output file. With memory-mapped files, this should be quite efficient. In practice, row Y, column X of 1,000,000 , has been dumped at address 16*X in file Y.dat; you need to write it to address 16*(Y*1,000,000 + X) into largeoutput.dat.
Actually, you could write the data even during the first calculation. Or you could have two processes communicating via a pipe, one calculating, one writing to both row-column and column-row files, so that you can monitor the performances of each.
Frankly, I think that adding more RAM and/or a fast I/O layer (SSD maybe?) could get you more bang for the same buck. Your time costs too, and the memory will remain available after this one work has been completed.
Yes. You can keep the 100 files open without doing the opening-closing-opening cycle. Most systems do have a limit on the number of open files though.
if I read after write w/o closing, do we always read all the written data
It depends on you. You can do an fseek goto wherever you want in the file and read data from there. It's all the way you and your logic.
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
This would definitely save some overhead, like additional unnecessary I/O operations and also in some systems, the content which you write to file is not immediately flushed to physical file, it may be buffered and flushed periodically and or done at the time of fclose.
So, such overheads are saved, but, the real question is what do you achieve by saving such overheads? How does it suit you in the overall picture of your application? This is the call which you must take before deciding on the logic.
I have an array of 3 different drives which I use in single profile (no raid). I don't use raid because the data isn't that important to spend some extra money for additioinal drives.
But what I could not figure out exactly is on what granularity the data is distributed on the 3 drives.
I could find this on the wiki page:
When you have drives with differing sizes and want to use the full
capacity of each drive, you have to use the single profile for the
data blocks, rather than raid0
As far as I understand this means that not the whole files are distributed/allocated on one of the 3 drives but each of the file's data blocks.
This is unfortunate because losing only 1 drive will destroy the whole array. Is there a possibility to balance a single profile array at a file level?
I would be fine with the risk of losing all files on 1 drive in the array but not losing the whole array if 1 drive fails.
I am dealing with this daily batch procedure, which is time and again throwing the above error during Insertion procedures in the batch.
The specified tablespace currently contained 5 datafiles, all only 40-50% occupied. But sometimes while trying to run insertion query we get "Unable to create initial extent..". Currently the problem is being solved by adding datafiles because of batch urgency, due to which almost half of the datafile space seems to be wasted.
I do not have enough privileges to run SYS DBA queries, but I need to come up with every possible reason of such behavior. For now, I have following informations
Table space is autoextensible.
No. of datafiles: 6 (recently added one during batch issue)
Each datafile size: 29.3 GB
Blocks: 3,840,000
Increment by: 100 MB
Max size: 29.3 GB
With my research, such problem might be due to fragmentation in the tablespace [ the most solid conclusion I have reached till now]. Could there be any other potential cause?
First thing to know is that even if you freed some space inside your tablespaces, this space isn't released to the system. There are possibility to shrink the tablesapces and diminish fragmentation within them. Here is what you should check in such a case:
The tablespace's size is limited; this limit can be extended.
There is no more space on the file system
check the db file size; e.g. on unix do a df -g /my_db/path/
note in some cases the above doesn't give 100%, even though a process is still locking the disk space and must be killed for the space to be really released.
I checked the size of the file itself - it said it's 4.02 GB. I checked the size mentioned in properties of the database using SSMSE - it said the size was somewhere above 4000 MB.
I then executed sp_spaceused and it said that the file size was above 4500 MB and the unallocated space was close to 4100 MB.
I'm a little confused as to how this works. I'm using SQL Server Express so I need to monitor the db size to figure out if it's reaching its limit. How do I figure out the in-use db size?
You're on the right track for analyzing the file use. The actual numbers will vary slightly for different reasons. For example, the file size greater than 4500 MB and the unallocated space of 4100 MB from sp_spaceused include the transaction log file (file with .ldf extension). The tran log is separate from your user data; so, for user-data space analysis exclude the space allocated for the tran log.
The file size represents disk space that has been reserved for your database, but not necessarily used by the database. You have to look at the SQL Server information to learn how much of that file is actually in use.
Your database size will reflect the approximate size of the database. This is the amount of file space that is allocated to the database. The unallocated space is the amount of file space that is not in use by the database and is available for new data. So, your current space used is approximately 400 MB (4500 - 4100 MB). And, the unused space is about 4100 MB.
if storage is your problem, try using SHRINKFILES or SHRINKDATABASE on the database regularly. After that, you can check the size of the file.
Good luck.