I am wondering about the memory allocation in database whether it assign the memory on the base of DataType of the column or it assign according to the value. Being a .net developer i have the concept that memory allocation is assign on the bases of DataType not on the value. Now I have question how memory allocation is handled on DB side.
FOR EXAMPLE
| id1 | id2 | id3 | name
| NULL | NULL | NULL | James Bond
id1,id2,id3 has null values, what will be the memory size of this row. will it assign the memory to the columns having null values?
Edit
Database Server SQLServer2008 r2
Thanks in advance
Physical storage for SQL Server is in a unit called a "page". There are several structures within a page, those structures are called "records". There are several types of records, the type of record you seem to be asking about is called a "data record".
(There are also several other types of records within a page: index records, forwarding records, ghost records, text records, and other internal record structures (allocation bitmaps, file headers, etc.)
To answer your question, without delving into all those details, and neglecting a discussion of "row compression" and "page compression"...
One part of the record is for the "fixed length" columns, where the columns that are defined with fixed length datatypes are stored (integer, float, date, char(n), etc.). As the name implies, a fixed amount of storage is reserved for each column. Another part of the record is the "variable length" portion, where columns with datatypes of variable length are stored, arranged as an array, a two-byte count of the number of variable length columns, and for each column, a two-byte offset to the end of the column value.
Q: what will be the memory size of this row.
A: In your case, the table with four columns, there will be eight bytes for the record header, some fixed number of bytes for the fixed length columns, three bytes for the NULL bitmap, and a variable amount of storage for the variable length columns.
The "memory size for the row" is really determined by the datatypes of the columns, and for variable length columns, the values that are stored.
(And if any indexes exist, there's also space required in index records.)
Q: will it assign the memory to the columns having null values?
If the columns are fixed length, yes. If columns are variable length, yes, at a minimum, the two byte offset to the end of the value, even if the value is zero length.
SQL Server manages memory in "pages"... In terms of estimating memory requirements, the more pertinent question is "how many rows fit in a page", and "how many pages are required to store my rows?"
A page that contains one data record requires 4KB of memory. A page that contains a dozen data records requires 4KB of memory.
I have found the answer on MSDN
Use Sparse Columns
The SQL Server Database Engine uses the SPARSE keyword in a column definition to optimize the storage of values in that column. Therefore, when the column value is NULL for any row in the table, the values require no storage.
http://msdn.microsoft.com/en-us/library/cc280604.aspx
Related
I was reading a book :
For example, when a column is defined as VARCHAR(25), the maximum number of characters supported is 25, but in practice, the actual number of characters in the string determines the amount of storage. Because storage consumption
for these data types is less than that for fixed-length types, read operations are faster. However, updates might result in row expansion, which might result in data movement outside the current page. Therefore, updates of data having variable-length data types are less efficient than updates of data having fixed-length data types.
I can understand that storage consumption for varchar is less than that for char, but why is it slower than char when updating records? What does row expansion mean and what actually happen when row expanses?
Let's say we have a suburb table which has two columns, zipcode char(5) and name varchar, and let's say we need to update a row's record with zipcode to be 10005, and name to be 'NYC', we only set 3 characters for the name column, shouldn't it be more efficient than zipcode column which requires 5 characters?
Rows are laid out with the fixed size columns first, at fixed offsets from the start of the row. Then (after some important bytes in the middle) the variable sized data is placed at the end. Because it's variable sized, the actual offset to the data cannot be computed for the whole table (like the fixed data) but has to be computed on a row-by-row basis.
And if a varchar(5)1 is storing NYC and is then asked to store NYCX, it may find that there's not a spare byte at the end of NYC - it's being used for another column - so the row has to expand by moving everything after one byte further along to make space for the extra byte.
1I notice in one of your examples you failed to specify a length. Please drill into yourself that that's a bad habit
I’m using SQL server 2016 and I have table in my database and table size is 120 GB. It has 300 columns and all columns are NVARCHAR(MAX) and it has 12,00,000 records in it. Mostly 100 columns are NULL all the time or it will have a short value. Here my doubt is why 12,00,000 records taken 120 GB, is it because of datatype?
This a Audit table. This will have CDC historical information.On average this table will get inserted 10,000 records per day. Because on this, my database size is increasing and SQL queries are slow. This is an Audit table and not used for any queries.
Please let me know the reason why my table is very big.
Of course, it depends on how you are measuring the size of the table and what other operations occur.
You are observing about 10,000 bytes per record. That does seem large, but there are things you need to consider.
NVARCHAR(MAX) has a minimum size:
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and
can be a value from 1 through 4,000. max indicates that the maximum
storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is
two times the actual length of data entered + 2 bytes. The ISO
synonyms for nvarchar are national char varying and national character
varying.
Even the empty fields occupy 2 bytes plus the nullable flag. With 300 fields, that is 600-plus bytes right there (600 + 600 / 8).
You may also have issues with pages that are only partially filled. This depends on how you insert data, the primary key, and system parameters.
And there are other considerations, depending on how you are measuring the size:
How large are the largest fields?
How often are rows occupying multiple pages (each additional page has additional overhead)?
You are using wide characters, so they may seem larger than they seem.
Is your estimate including indexes?
If you are measuring database size, you may be including log tables.
I would suggest that you have your DBA investigate the table to see if there are any obvious problems, such as many pages that are only partially filled.
Edit: updated answer upon clarification on the number of rows that the table really have.
Taking into account that 120GB are 120,000MB you are getting 100KB per row, that is about 330 bytes for each column on average, which its usually quite higher but not for a table with 300 nvarchar(max) columns (note that the nchar and nvarchar types take 2 bytes per char, not 1).
Also you commented that one of that columns have a size of 2,000-90,000 characters (!), supposing that column has on average 46k characters we get a size of:
1,200,000 rows x 46k chars x 2 byte/char = 105GB only for the data of that column.
That leaves 15GB for the rest of columns, or about 13KB per row, which is 44 bytes per column, quite low taking into account that almost all are nvarchar(max).
But those are only estimations, for getting the real size of any column use:
select sum(datalength(ColumnName))/1024.00 as SizeKB from TableName
And all of this is only taking into account data, which is not accurate because the database structures needs its size. For example, indexes sum to the total size of a table, roughly they take the sum of the size of the columns included in the index (for example, if you would define and index on the Big Column it would take another 100GB).
You can obtain how many space the whole table uses, using the following script from another question (it will show the size for each table of the DB):
Get size of all tables in database
Check the column UsedSpaceMB, that is the size needed for the data and the indexes, if for some reason the table is using more space (usually because you deleted data) you get that size in UnusedSpaceMB (a bit of unused space is normal).
I am trying to estimate database size for SQL Server 2008 R2. I have a table with one INTEGER primary key and 39 text columns of type VARCHAR(MAX).
I have searched and found two statements.
A table can contain a maximum of 8,060 bytes per row.
Varchar(max) has a maximum storage capacity of 2 gigabytes.
I am confused to get estimate the size. How can I store 2 gigabytes in each column if there is a limit on row?
I am not database expert may be I am not getting it correctly.
Can anyone explain How to estimate it?
Thank you
In Microsoft SQL Server, data (which includes indexes) are stored in one or more 8k (8192 bytes) "pages". There are different types of pages that can be used to handle various situations (e.g. Data, LOB, Index, AllocationMap, etc) . Each page has a header which is meta-data about that page and what it contains.
Most data is stored in the row itself, and one or more of these rows are in turn stored in a page for "in-row data". Due to the space taken by the row header, the largest a row can be (for "in-row" data) is 8060 bytes.
However, not all data is stored in the row. For certain datatypes, the data can actually be stored on a "LOB data" page while a pointer is left in the "in-row" data:
Legacy / deprecated LOB types that nobody should be using anymore (TEXT, NTEXT, and IMAGE), by default, always store their data on LOB pages and always use a 16-byte pointer to that LOB page.
The newer LOB types (VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), and XML), by default, will attempt to fit the data directly in the row if it will fit. Else it will store the data on LOB pages and use a pointer of 24 - 72 bytes (depending on the size of the LOB data).
This is how you could store up to 78 GB + 4 bytes (can't forget about the INT Primary Key ;-) in a single row: the max row size will be between 940 bytes ((39 * 24) + 4) and 2812 bytes ((39 * 72) + 4). But again, that is just the maximum range; if the data in each of the 39 VARCHAR(MAX) fields is just 10 bytes, then all of the data will be stored in-row and the row size will be 394 bytes ((39 * 10) + 4).
Given that you have so many variable-length fields (whether they are MAX or not), the only way to estimate the size of future rows is to have a good idea about what data you will be storing in this table. Although, a table with all, or even mostly, MAX datatypes implies that nobody really has any idea what is going to be stored in this table.
Along those lines, it should be pointed out that this is a horribly modeled table / horrible use of MAX datatype fields, and should be refactored.
For more details about how data pages are structured, please see my answer to the following DBA.StackExchange question:
SUM of DATALENGTHs not matching table size from sys.allocation_units
When you use Varchar(MAX), data can be stored within the row(called a page) (if contents are <8000 bytes). If the contents are >8000 bytes, data is stored as a LOB ("off the page"), and only a reference to the actual location is stored within the page. I honestly don't know of any decent way to estimate the size of your entire database, considering the data may be any length in a Varchar(MAX) column.
I have a SQL Server 2008 database that stores millions of rows. There are several NVARCHAR columns that will never exceed the current max length of the column, nor get close to it due to application constraints.
i.e.
The Address NVARCHAR field has a length of 50 characters, but it'll never exceed 32 characters.
Is there a performance benefit or space saving benefit to me reducing the size of the NVARCHAR column to what it's actual max length will be (i.e. in the case of the Address field, 32 characters). Or will it not make a difference since it's a variable length field?
Setting the number of characters in NVARCHAR is mainly for validation purposes. If there is some reason why you don't want the data to exceed 50 characters then the database will enforce that rule for you by not allowing extra data.
If the total row size exceeds a threshold then it can affect performance, so by restricting the length you could benefit by not allowing your row size to exceed that threshold. But in your case, that does not seem to matter.
The reason for this is that SQL Server can fit more rows onto a Page, which results in less disk I/O and more rows can be stored in memory.
Also, the maximum row size in SQL Server is 8KB as that is the size of a page and rows cannot cross page boundaries. If you insert a row that exceeds 8KB, the extra data will be stored in a row overflow page, which will likely have a negative affect on performance.
There is no expected performance or space saving benefit for reducing your n/var/char column definitions to their maximum length. However, there may be other benefits.
The column won't accidentally have a longer value inserted without generating an error (desirable for the "fail fast" characteristic of well-designed systems).
The column communicates to the next developer examining the table something about the data, that aids in understanding. No developer will be confused about the purpose of the data and have to expend wasted time determining if the code's field validation rules are wrong or if the column definition is wrong (as they logically should match).
If your column does need to be extended in length, you can do so with potential consequences ascertained in advance. A professional who is well-versed in databases can use the opportunity to see if upcoming values that will need the new column length will have a negative impact on existing rows or on query performance—as the amount of data per row affects the number of reads required to satisfy queries.
I have an issue where I have one column in a database that might be anything from 10 to 10,000 bytes in size. Do you know if PostgreSQL supports sparse data (i.e. will it always set aside the 10,000 bytes fore every entry in the column ... or only the space that is required for each entry)?
Postgres will store long varlena types in an extended storage called TOAST.
In the case of strings, it keep things inline up to 126 bytes (potentially means less 126 characters for multibyte stuff), and then sends it to the external storage.
You can see where the data is stored using psql:
\dt+ yourtable
As an aside, note that from Postgres' standpoint, there's absolutely no difference (with respect to storage) between declaring a column's type as varchar or varchar(large_number) -- it'll be stored in the exact same way. There is, however, a very slight performance penalty in using varchar(large_number) because of the string length check.
use varchar or text types - these use only the space actually required to store the data (plus a small overhead of 2 bytes for each value to store the length)