I have a strange situation with a SQL Server database where to actual data in the table is roughly 320 MiB. This is determined by summing up the DATALENGTH of all the columns. This will ignore fragmentation, index space and other SQL Server internal overhead. The problem though is that the table is roughly 40 GiB in size and it's growing at an alarming rate, very disproportionate to the amount of data in bytes or rows that was inserted.
I used the sys.dm_db_index_physical_stats function to look at the physical data and the roughly 40 GiB of data is tied up in LOB_DATA.
The most of the 320 MiB that makes up the table contents is of type ntext. Now, my question is how come SQL Server has allocated 40 GiB of LOB_DATA when there's only roughly 310 MiB ntext data.
Will the problem go away if we convert the column to nvarchar(max)? Are there any storage engine specifics regarding ntext and LOB_DATA that is causing the LOB_DATA pages to not be reclaimed? Why is it groing at a so disproportionate rate with regards to the amount of changes that are being made?
Related
I need the clarification on the size of Snowflake's Micro partition size. In the snowflake official document, It is mentioned as below.
Each micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is smaller because data is always stored compressed).
However in some places i see below statement on the micro partition size.
Snowflake also stores multiple rows together in micro-partitions, which are variable-length data blocks of around 16 Mb size
What is the size of the data that Micro-partition can hold 16 MB or (50 -500 MB), or else does each Micro-partition has data block which is of 16 MB?
The key point is compression:
Benefits of Micro-partitioning
As the name suggests, micro-partitions are small in size (50 to 500 MB, before compression), which enables extremely efficient DML and fine-grained pruning for faster queries.
Columns are also compressed individually within micro-partitions. Snowflake automatically determines the most efficient compression algorithm for the columns in each micro-partition.
The size of 50-500MB is for uncompressed data and micropartition itself holds around 16MB(after compression).
My application (industrial automation) uses SQL Server 2017 Standard Edition on a Dell T330 server, has the configuration:
Xeon E3-1200 v6
16gb DDR4 UDIMMs
2 x 2tb HD 7200RPM (Raid 1)
In this bank, I am saving the following tables:
Table: tableHistory
Insert Range: Every 2 seconds
410 columns type float
409 columns type int
--
Table: tableHistoryLong
Insert Range: Every 10 minutes
410 columns type float
409 columns type int
--
Table: tableHistoryMotors
Insert Range: Every 2 seconds
328 columns type float
327 columns type int
--
Table: tableHistoryMotorsLong
Insert Range: Every 10 minutes
328 columns type float
327 columns type int
--
Table: tableEnergy
Insert Range: Every 700 milliseconds
220 columns type float
219 columns type int
Note:
When I generate reports / graphs, my application inserts the inclusions in the buffer. Because the system cannot insert and consult at the same time. Because queries are well loaded.
A columns, they are values of current, temperature, level, etc. This information is recorded for one year.
Question
With this level of processing can I have any performance problems?
Do I need better hardware due to high demand?
Can my application break at some point due to the hardware?
Your question may be closed as too broad but I want to elaborate more on the comments and offer additional suggestions.
How much RAM you need for adequate performance depends on the reporting queries. Factors include the number of rows touched, execution plan operators (sort, hash, etc.), number of concurrent queries. More RAM can also improve performance by avoiding IO, especially costly with spinning media.
A reporting workload (large scans) against a 1-2TB database with traditional tables needs fast storage (SSD) and/or more RAM (hundreds of GB) to provide decent performance. The existing hardware is the worst case scenario because data are unlikely to be cached with only 16GB RAM and a singe spindle can only read about 150MB per second. Based on my rough calculation of the schema in your question, a monthly summary query of tblHistory will take about a minute just to scan 10 GB of data (assuming a clustered index on a date column). Query duration will increase with the number of concurrent queries such that it would take at least 5 minutes per query with 5 concurrent users running the same query due to disk bandwidth limitations. SSD storage can sustain multiple GB per second so, with the same query and RAM, a data transfer time for the query above will take under 5 seconds.
A columnstore (e.g. a clustered columnstore index) as suggested by #ConorCunninghamMSFT will reduce the amount of data transferred from storage greatly because only data for the columns specified in the query are read and inherent columnstore compression
will reduce both the size of data on disk and the amount transferred from disk. The compression savings will depend much on the actual column values but I'd expect 50 to 90 percent less space compared to a rowstore table.
Reporting queries against measurement data are likely to specify date range criteria so partitioning the columnstore by date will limit scans to the specified date range without a traditional b-tree index. Partitioning will also also facilitate purging for the 12-month retention criteria with sliding window partition maintenenace (partition TRUNCATE, MERGE, SPLIT) and thereby greatly improve performance of the process compared to a delete query.
I’m using SQL server 2016 and I have table in my database and table size is 120 GB. It has 300 columns and all columns are NVARCHAR(MAX) and it has 12,00,000 records in it. Mostly 100 columns are NULL all the time or it will have a short value. Here my doubt is why 12,00,000 records taken 120 GB, is it because of datatype?
This a Audit table. This will have CDC historical information.On average this table will get inserted 10,000 records per day. Because on this, my database size is increasing and SQL queries are slow. This is an Audit table and not used for any queries.
Please let me know the reason why my table is very big.
Of course, it depends on how you are measuring the size of the table and what other operations occur.
You are observing about 10,000 bytes per record. That does seem large, but there are things you need to consider.
NVARCHAR(MAX) has a minimum size:
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and
can be a value from 1 through 4,000. max indicates that the maximum
storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is
two times the actual length of data entered + 2 bytes. The ISO
synonyms for nvarchar are national char varying and national character
varying.
Even the empty fields occupy 2 bytes plus the nullable flag. With 300 fields, that is 600-plus bytes right there (600 + 600 / 8).
You may also have issues with pages that are only partially filled. This depends on how you insert data, the primary key, and system parameters.
And there are other considerations, depending on how you are measuring the size:
How large are the largest fields?
How often are rows occupying multiple pages (each additional page has additional overhead)?
You are using wide characters, so they may seem larger than they seem.
Is your estimate including indexes?
If you are measuring database size, you may be including log tables.
I would suggest that you have your DBA investigate the table to see if there are any obvious problems, such as many pages that are only partially filled.
Edit: updated answer upon clarification on the number of rows that the table really have.
Taking into account that 120GB are 120,000MB you are getting 100KB per row, that is about 330 bytes for each column on average, which its usually quite higher but not for a table with 300 nvarchar(max) columns (note that the nchar and nvarchar types take 2 bytes per char, not 1).
Also you commented that one of that columns have a size of 2,000-90,000 characters (!), supposing that column has on average 46k characters we get a size of:
1,200,000 rows x 46k chars x 2 byte/char = 105GB only for the data of that column.
That leaves 15GB for the rest of columns, or about 13KB per row, which is 44 bytes per column, quite low taking into account that almost all are nvarchar(max).
But those are only estimations, for getting the real size of any column use:
select sum(datalength(ColumnName))/1024.00 as SizeKB from TableName
And all of this is only taking into account data, which is not accurate because the database structures needs its size. For example, indexes sum to the total size of a table, roughly they take the sum of the size of the columns included in the index (for example, if you would define and index on the Big Column it would take another 100GB).
You can obtain how many space the whole table uses, using the following script from another question (it will show the size for each table of the DB):
Get size of all tables in database
Check the column UsedSpaceMB, that is the size needed for the data and the indexes, if for some reason the table is using more space (usually because you deleted data) you get that size in UnusedSpaceMB (a bit of unused space is normal).
I am trying to estimate database size for SQL Server 2008 R2. I have a table with one INTEGER primary key and 39 text columns of type VARCHAR(MAX).
I have searched and found two statements.
A table can contain a maximum of 8,060 bytes per row.
Varchar(max) has a maximum storage capacity of 2 gigabytes.
I am confused to get estimate the size. How can I store 2 gigabytes in each column if there is a limit on row?
I am not database expert may be I am not getting it correctly.
Can anyone explain How to estimate it?
Thank you
In Microsoft SQL Server, data (which includes indexes) are stored in one or more 8k (8192 bytes) "pages". There are different types of pages that can be used to handle various situations (e.g. Data, LOB, Index, AllocationMap, etc) . Each page has a header which is meta-data about that page and what it contains.
Most data is stored in the row itself, and one or more of these rows are in turn stored in a page for "in-row data". Due to the space taken by the row header, the largest a row can be (for "in-row" data) is 8060 bytes.
However, not all data is stored in the row. For certain datatypes, the data can actually be stored on a "LOB data" page while a pointer is left in the "in-row" data:
Legacy / deprecated LOB types that nobody should be using anymore (TEXT, NTEXT, and IMAGE), by default, always store their data on LOB pages and always use a 16-byte pointer to that LOB page.
The newer LOB types (VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), and XML), by default, will attempt to fit the data directly in the row if it will fit. Else it will store the data on LOB pages and use a pointer of 24 - 72 bytes (depending on the size of the LOB data).
This is how you could store up to 78 GB + 4 bytes (can't forget about the INT Primary Key ;-) in a single row: the max row size will be between 940 bytes ((39 * 24) + 4) and 2812 bytes ((39 * 72) + 4). But again, that is just the maximum range; if the data in each of the 39 VARCHAR(MAX) fields is just 10 bytes, then all of the data will be stored in-row and the row size will be 394 bytes ((39 * 10) + 4).
Given that you have so many variable-length fields (whether they are MAX or not), the only way to estimate the size of future rows is to have a good idea about what data you will be storing in this table. Although, a table with all, or even mostly, MAX datatypes implies that nobody really has any idea what is going to be stored in this table.
Along those lines, it should be pointed out that this is a horribly modeled table / horrible use of MAX datatype fields, and should be refactored.
For more details about how data pages are structured, please see my answer to the following DBA.StackExchange question:
SUM of DATALENGTHs not matching table size from sys.allocation_units
When you use Varchar(MAX), data can be stored within the row(called a page) (if contents are <8000 bytes). If the contents are >8000 bytes, data is stored as a LOB ("off the page"), and only a reference to the actual location is stored within the page. I honestly don't know of any decent way to estimate the size of your entire database, considering the data may be any length in a Varchar(MAX) column.
I currently have a database that is 20GB in size.
I've run a few scripts which show on each tables size (and other incredibly useful information such as index stuff) and the biggest table is 1.1 million records which takes up 150MB of data. We have less than 50 tables most of which take up less than 1MB of data.
After looking at the size of each table I don't understand why the database shouldn't be 1GB in size after a shrink. The amount of available free space that SqlServer (2005) reports is 0%. The log mode is set to simple. At this point my main concern is I feel like I have 19GB of unaccounted for used space. Is there something else I should look at?
Normally I wouldn't care and would make this a passive research project except this particular situation calls for us to do a backup and restore on a weekly basis to put a copy on a satellite (which has no internet, so it must be done manually). I'd much rather copy 1GB (or even if it were down to 5GB!) than 20GB of data each week.
sp_spaceused reports the following:
Navigator-Production 19184.56 MB 3.02 MB
And the second part of it:
19640872 KB 19512112 KB 108184 KB 20576 KB
while I've found a few other scripts (such as the one from two of the server database size questions here, they all report the same information either found above or below).
The script I am using is from SqlTeam. Here is the header info:
* BigTables.sql
* Bill Graziano (SQLTeam.com)
* graz#<email removed>
* v1.11
The top few tables show this (table, rows, reserved space, data, index, unused, etc):
Activity 1143639 131 MB 89 MB 41768 KB 1648 KB 46% 1%
EventAttendance 883261 90 MB 58 MB 32264 KB 328 KB 54% 0%
Person 113437 31 MB 15 MB 15752 KB 912 KB 103% 3%
HouseholdMember 113443 12 MB 6 MB 5224 KB 432 KB 82% 4%
PostalAddress 48870 8 MB 6 MB 2200 KB 280 KB 36% 3%
The rest of the tables are either the same in size or smaller. No more than 50 tables.
Update 1:
- All tables use unique identifiers. Usually an int incremented by 1 per row.
I've also re-indexed everything.
I ran the dbcc shrink command as well as updating the usage before and after. And over and over. An interesting thing I found is that when I restarted the server and confirmed no one was using it (and no maintenance procs are running, this is a very new application -- under a week old) and when I went to run the shrink, every now and then it would say something about data changed. Googling yielded too few useful answers with the obvious not applying (it was 1am and I disconnected everyone, so it seems impossible that was really the case). The data was migrated via C# code which basically looked at another server and brought things over. The quantity of deletes, at this point in time, are probably under 50k in rows. Even if those rows were the biggest rows, that wouldn't be more than 100M I would imagine.
When I go to shrink via the GUI it reports 0% available to shrink, indicating that I've already gotten it as small as it thinks it can go.
Update 2:
sp_spaceused 'Activity' yields this (which seems right on the money):
Activity 1143639 134488 KB 91072 KB 41768 KB 1648 KB
Fill factor was 90.
All primary keys are ints.
Here is the command I used to 'updateusage':
DBCC UPDATEUSAGE(0);
Update 3:
Per Edosoft's request:
Image 111975 2407773 19262184
It appears as though the image table believes it's the 19GB portion.
I don't understand what this means though.
Is it really 19GB or is it misrepresented?
Update 4:
Talking to a co-worker and I found out that it's because of the pages, as someone else here has also state the potential for that. The only index on the image table is a clustered PK. Is this something I can fix or do I just have to deal with it?
The regular script shows the Image table to be 6MB in size.
Update 5:
I think I'm just going to have to deal with it after further research. The images have been resized to be roughly 2-5KB each and on a normal file system doesn't consume much space but on SqlServer it seems to consume considerably more. The real answer, in the long run, will likely be separating that table in to another partition or something similar.
Try this query:
SELECT object_name(object_id) AS name, rows, total_pages,
total_pages * 8192 / 1024 as [Size(Kb)]
FROM sys.partitions p
INNER JOIN sys.allocation_units a
ON p.partition_id = a.container_id
You may also want to update the usage in the systables before you run the query to make sure that they are accurate.
DECLARE #DbName NVARCHAR(128)
SET #DbName = DB_NAME(DB_ID())
DBCC UPDATEUSAGE(#DbName)
what is the fill factor you're using in your reindexing? it has to be high. from 90-100% depending on the PK datatype.
if your fill factor is low then you'll have a lot of half empty pages which can't be shrunk down.
Did you try the dbcc command to shrink the catalog? If you transfer all data to an empty catalog, is it also 20GB?
A database uses a page-based file system, so you might be running into a lot of slack (empty space between pages) due to heavy row removal: if the dbms expects rows to be inserted at that spot, it might be better to leave the spots open. Do you use unique_identifier based PK's which have a clustered index?
you could try doing a database vacuum, this can often yield large space improvements if you have never done it before.
hope this helps.
Have you checked the stats under the "Shrink Database" dialog? In SQL Server Management Studio (2005 / 2008), right-click the database, click Tasks -> Shrink -> Database. That'll show you how much space is allocated to the DB, and how much of that allocated space is currently unused.
Have you ensured that the space isn't being consumed by your transaction log? If you're in full recovery mode, the t-log won't be shrinkable until you perform a transaction log backup.