How much size "Null" value takes in SQL Server - sql-server

I have a large table with say 10 columns. 4 of them remains null most of the times. I have a query that does null value takes any size or no size in bytes. I read few articles some of them are saying :
http://www.sql-server-citation.com/2009/12/common-mistakes-in-sql-server-part-4.html
There is a misconception that if we have the NULL values in a table it doesn't occupy storage space. The fact is, a NULL value occupies space – 2 bytes
SQL: Using NULL values vs. default values
A NULL value in databases is a system value that takes up one byte of storage and indicates that a value is not present as opposed to a space or zero or any other default value.
Can you please guide me regarding the size taken by null value.

If the field is fixed width storing NULL takes the same space as any other value - the width of the field.
If the field is variable width the NULL value takes up no space.
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
The reason for the discrepancies that you have observed in information from other sources:
The start of the first article is a bit misleading. The article is not talking about the cost of storing a NULL value, but the cost of having the ability to store a NULL (i.e the cost of making a column nullable). It's true that it costs something in storage space to make a column nullable, but once you have done that it takes less space to store a NULL than it takes to store a value (for variable width columns).
The second link seems to be a question about Microsoft Access. I don't know the details of how Access stores NULLs but I wouldn't be surprised if it is different to SQL Server.

The following link claims that if the column is variable length, i.e. varchar then NULL takes 0 bytes (plus 1 byte is used to flag whether value is NULL or not):
How does SQL Server really store NULL-s
The above link, as well as the below link, claim that for fixed length columns, i.e. char(10) or int, a value of NULL occupies the length of the column (plus 1 byte to flag whether it's NULL or not):
Data Type Performance Tuning Tips for Microsoft SQL Server
Examples:
If you set a char(10) to NULL, it occupies 10 bytes (zeroed out)
An int takes 4 bytes (also zeroed out).
A varchar(1 million) set to NULL takes 0 bytes (+ 2 bytes)
Note: on a slight tangent, the storage size of varchar is the length of data entered + 2 bytes.

From this link:
Each row has a null bitmap for columns
that allow nulls. If the row in that
column is null then a bit in the
bitmap is 1 else it's 0.
For variable size datatypes the
acctual size is 0 bytes.
For fixed size datatype the acctual
size is the default datatype size in
bytes set to default value (0 for
numbers, '' for chars).

Storing a NULL value does not take any space.
"The fact is, a NULL value occupies
space – 2 bytes."
This is a misconception -- that's 2 bytes per row, and I'm pretty sure that all rows use those 2 bytes regardless of whether there's any nullable columns.
A NULL value in databases is a system
value that takes up one byte of
storage
This is talking about databases in general, not specifically SQL Server. SQL Server does not use 1 byte to store NULL values.

Even though this questions is specifically tagged as SQL Server 2005, being that it is now 2021, it should be pointed out that it is a "trick question" for any version of SQL Server after 2005.
This is because if either ROW or PAGE compression are used, or if the column is defined as SPARSE, then it will "no space" in the actual row to store a 'NULL value'. These were added in SQL Server 2008.
The implementation notes for ROW COMPRESSION (which is a prerequisite for PAGE COMPRESSION) states:
NULL and 0 values across all data types are optimized and take no bytes1.
While there is still minimal metadata (4 bits per column + (record overhead / columns)) stored per non-sparse column in each physical record2, it's strictly not the value and is required in all cases3.
SPARSE columns with a NULL value take up no space and no relevant per-row metadata (as the number of SPARSE columns increase), albeit with a trade-off for non-NULL values.
As such, it is hard to "count" space without anlyzing the actual DB usage stats. The average bytes per row will vary based on precise column types, table/index rebuild settings, actual data and duplicity, fill capacity, effective page utilization, fragmentation, LOB usage, etc. and is often a more useful metric.
1 SQLite uses a similar approach to have effectively-free NULL values.
2 A brief of the technical layout used in ROW (and thus PAGE) compression can found in "SQL Server 2012 Internals: Special Storage".
Following the 1 or 2 bytes for the number of columns is the CD array, which uses 4 bits [of metadata] for each column in the table to represent information about the length of the column .. 0 (0×0) indicates that the corresponding column is NULL.
3 Fun fact: with ROW compression, bit column values exist entirely in the corresponding 4-bit metadata.

Related

Storage of Bit columns for null values?

The Microsoft Documentation at https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-2017 says:
An integer data type that can take a value of 1, 0, or NULL.
The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.
The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.
Converting to bit promotes any nonzero value to 1.
How is it possible to store 1, 0 and NULL in a single bit?
Quoting a canonical answer by #MarkByers in the question How much size “Null” value takes in SQL Server regarding how SQL Server stores NULL in general:
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
So, I would expect the BIT type to behave the same as any other column, meaning that there would be a separate bit to keep track of whether the column be NULL or not NULL. Therefore, a BIT column in SQL Server actually uses two bits to keep track of the three values.
There is a NULL bitmap mask in the row header that keeps track of what columns is null or not.

SQL Server data types to store strings and take less space

I have a question in regards to data types that are available in SQL language to store data into the database itself. Since I'm dealing with database that is quite large, and has a tendency to expand over 150GB+ of data, I need to pay close attention and save up every bit of space on the server's hard drive so that the database doesn't takes up all the precious space. So my question is as following:
Which data type is the best to store 80-200 character long string in database?
I'm aware of for example varchar(200) and nvarchar(200) where the nvarchar supports unicode character. Which one of these would take up less space in database, or if there's a 3rd data type that I'm not aware of, and which I could use to store the data (if I know for a fact that the string I would store is just a combination of numbers and letters, without any special characters)
Are there some other techniques that I could use to save up space in database so that it doesn't expands rapidly ?
Can someone help me out with this ?
P.S. Guys, I have a 4th question as well:
If for example I have nvarchar(max) data type which is in a table, and the entered record takes up only 100 characters, how much data is reserved for that kind of record?
Let's say that I have ID which is of following form 191697193441 ... Would it make more sense to store this number as varchar(200) or bigint ?
The size needed for nvarchar is 2 bytes per character, as it represents unicode data. varchar needs 1 byte per character. The storage size is the actual number of characters entered + 2 bytes overhead. This is also true for varchar(max).
From https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql:
varchar [ ( n | max ) ] Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes.
So for your 4th question, nvarchar would need 100 * 2 + 2 = 202 bytes, varchar would need 100 * 1 + 2 = 102 bytes.
There's no performance or data size difference as they're variable length data types, so they'll only use the space they need.
Think of the size parameter as more of a useful constraint. For e.g. if you have a surname field, you can reasonably expect 50 characters to be a sensible maximum size and you have more chance of a mistake (misuse of the field, incorrect data capture etc.) throwing an error, rather than adding nonsense to the database and needing future data cleansing.
So, my general rule of thumb is make them as large as the business requirements demand, but no larger. It's trivial to amend variable data sizes to a larger length in the future if requirements change.

How to calculate storage space used by NULL value?

It seems to be really hard to find accurate information about this. MSDN has a article about sparse columns, and which null percentage thresholds should be considered when using them. But the facts concerning default null storage space usage seem to be very difficult to come by.
Some sources claim that NULL values take no space whatsoever, but that would mean that sparse columns would be pointless in the first place. Some claim that only the null bitmap in the table definition adds a bit representing each nullable column, but that there's no further overhead. Some claim that fixed-length columns (char, int, bigint etc) actually use up the same amount of storage space regardless of whether the value is null or not.
So which is it, really?
Let's say I have a list of all the nullable columns in our DB with total rows in the table, and the number of NULL rows per each column and type. How would I calculate exactly how much space the NULL values are using now, so I could then predict exactly how much space is saved by altering the columns to sparse instead? I can add the 4 byte overhead to the non-null rows just fine, but it doesn't help when I have no idea what to do with the null rows?
For fixed length types such as int NULL, it always use the length of the type (ie 4 bytes for int whether it is set to NULL or NOT NULL).
For variable length types, it takes 0 bytes to store the NULL + 2 bytes in the variable length columns offset list. This is used to record where each variable length value is really stored in the row on the page.
In addition, the NULL or NOT NULL flag uses 1 bit for each columns. A table with 12 columns will use 12/8 bytes (=2 bytes NULL bitmap).
This link will give you a lot more information on the subject
Once you know the percentage of NULL, you can look at this link for an estimate of the potential gain. Sparse saves space on null value but will requieres more space for not null values.

Memory Usage in SQL Server?

I have been thinking about database design lately and I have the following question:
When a type, say a varchar(max), is set for a column is 2GB of space set aside every time a row is inserted?
Or is the space allocated on the server equal to the amount of data in the column?
Thanks!
The varchar data type in SQL Server and elsewhere means roughly variable-length character data. The max or any other constant value represents its upper bound and not its absolute size. So your latter observation is correct:
the space allocated on the server
equal to the amount of data in the
column
Now, if you define something like char(200) (notice the lack of var in front of char there) then yes, 200 characters are allocated regardless of how much data (up to 200 chars) you store in that field. The maximum upper bound for the char data type is 8000, by the way.
Of course not :)
Varchar(max) is a bastardised hybrid data type (if you do not mind me being a bit frank):
It is stored with the rest of the columns in the row if total row length does not exceed 8KB limit
Otherwise stored as a blob. A blob (TEXT,IMAGE) takes as much as its length.

What is the value of NULL in SQL Server?

I know NULL is not zero... nor it is empty string. But then what is the value of NULL... which the system keeps, to identify it?
NULL is a special element of the SQL language that is neither equal to or unequal to any value in any data type.
As you said, NULL is not zero, an empty string, or false. I.e. false = NULL returns UNKNOWN.
Some people say NULL is not a value, it's a state. The state of having no value. Sort of like a Zen Koan. :-)
I don't know specifically how MS SQL Server stores it internally, but it doesn't matter, as long as they implement it according to the SQL standard.
I believe that for each column that allow nulls, the rows have a null bitmap. If the row in the specified column is null the bit in the bitmap is 1 otherwise is 0.
The SQL Server row format is described in MSDN, and also analyzed on various blogs, like Paul Randal's Anatomy of a Record. The important information is the record structure:
record header
4 bytes long
two bytes of record metadata (record type)
two bytes pointing forward in the record to the NULL bitmap
fixed length portion of the record, containing the columns storing data types that have fixed lengths (e.g. bigint, char(10), datetime)
NULL bitmap
two bytes for count of columns in the record
variable number of bytes to store one bit per column in the record, regardless of whether the column is nullable or not
this allows an optimization when reading columns that are NULL
variable-length column offset array
two bytes for the count of variable-length columns
two bytes per variable length column, giving the offset to the end of the column value
versioning tag
a 14-byte structure that contains a timestamp plus a pointer into the version store in tempdb
So NULL fields have a bit set in the NULL bitmap.
I doubt the interviewers wanted you to know exactly how SQL server stores nulls, the point of such a question is to get you to think about how you'd store special values. You can't use a sentinel value (or magic number), as that would make any rows with that value in them suddenly become null.
There are quite a few ways to achieve this. The most straightforward 2 that come to mind are to have a flag stored with each nullable value that is basically an isNull flag (this is also basically how Nullable<T> works in .NET). A 2nd method is to store with each row a bitmap of null flags, one for each column.
When faced with such an interview questions the absolute worst response is to sit and stare blankly. Think out loud some, admit that you don't know how SQL Server does it, and then present some reasonable sounding ways of doing it. You should also be ready to talk a bit about why you'd pick one method over another, and what the pluses and minuses of each are.
As Bill said, it's a state not a value.
In a row in a table SQL Server it's stored in the null bitmap: no value is actually stored.
One quirk of SQL Server:
SELECT TOP 1 NULL AS foo INTO dbo.bar FROM sys.columns
What datatype is foo? It has no meaning of course, but it caught me out once.
Conceptually, NULL means “a missing unknown value” and it is treated somewhat differently from other values.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Two NULL values are regarded as equal in a GROUP BY.
When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
The value of NULL means in essence a missing value, some people will also use the term unknow
A Null value is also not equal to anything not even another NULL
Take a look at these examples
will print is equal
IF 1 = 1
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is equal (an implicit conversion happens)
IF 1 = '1'
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is not equal
IF NULL = NULL
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'

Resources