Storage of Bit columns for null values? - sql-server

The Microsoft Documentation at https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-2017 says:
An integer data type that can take a value of 1, 0, or NULL.
The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.
The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.
Converting to bit promotes any nonzero value to 1.
How is it possible to store 1, 0 and NULL in a single bit?

Quoting a canonical answer by #MarkByers in the question How much size “Null” value takes in SQL Server regarding how SQL Server stores NULL in general:
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
So, I would expect the BIT type to behave the same as any other column, meaning that there would be a separate bit to keep track of whether the column be NULL or not NULL. Therefore, a BIT column in SQL Server actually uses two bits to keep track of the three values.

There is a NULL bitmap mask in the row header that keeps track of what columns is null or not.

Related

How to calculate storage space used by NULL value?

It seems to be really hard to find accurate information about this. MSDN has a article about sparse columns, and which null percentage thresholds should be considered when using them. But the facts concerning default null storage space usage seem to be very difficult to come by.
Some sources claim that NULL values take no space whatsoever, but that would mean that sparse columns would be pointless in the first place. Some claim that only the null bitmap in the table definition adds a bit representing each nullable column, but that there's no further overhead. Some claim that fixed-length columns (char, int, bigint etc) actually use up the same amount of storage space regardless of whether the value is null or not.
So which is it, really?
Let's say I have a list of all the nullable columns in our DB with total rows in the table, and the number of NULL rows per each column and type. How would I calculate exactly how much space the NULL values are using now, so I could then predict exactly how much space is saved by altering the columns to sparse instead? I can add the 4 byte overhead to the non-null rows just fine, but it doesn't help when I have no idea what to do with the null rows?
For fixed length types such as int NULL, it always use the length of the type (ie 4 bytes for int whether it is set to NULL or NOT NULL).
For variable length types, it takes 0 bytes to store the NULL + 2 bytes in the variable length columns offset list. This is used to record where each variable length value is really stored in the row on the page.
In addition, the NULL or NOT NULL flag uses 1 bit for each columns. A table with 12 columns will use 12/8 bytes (=2 bytes NULL bitmap).
This link will give you a lot more information on the subject
Once you know the percentage of NULL, you can look at this link for an estimate of the potential gain. Sparse saves space on null value but will requieres more space for not null values.

Optimize a string comparison by changing it to 5 int fields or 3 bigint fields?

I am attempting to find a way to optimize a comparison between two SHA1 values in a SQL-Server 2008 R2 database. These are currently 40 character hexadecimal values stored as char(40) within the database. The values are indexed. The list of 'known values' is comprised of 21082054 unique entries. This will be used to compare against data sets that can range in size from under a dozen to billions of entries.
As a software developer I understand that a 40 character string comparison is comparing 40 separate values, one at a time, with an early out option (As soon as they differ, the comparison ends). So the next logical step to change attempt to improve this would seem to be to move the hexadecimal value into containing integer values. This leaves me with 5 32-bit integers or 3 64-bit integers, int and long respectively for most languages these days.
What I am not sure of is how well this line of thinking translates into the SQL-Server 2008 environment. Currently the SHA1 is the Primary Key of the table. To keep that same requirement on the data I would then have to make the primary key 5 or 3 separate fields, build an index on all of those fields and then replicate these changes from the known length table to the unknown length tables.
TL;DR: Will changing a 40 character hexadecimal string into separate integer value fields increase comparison/lookup speed execution?
I doubt you have to care about that.
A 40-character string comparison is not comparing all 40 characters, unless the first 39 characters are equal.
Nearly all the time it will stop after 1 character.
Most of the rest of the time it will stop after 2.

How much size "Null" value takes in SQL Server

I have a large table with say 10 columns. 4 of them remains null most of the times. I have a query that does null value takes any size or no size in bytes. I read few articles some of them are saying :
http://www.sql-server-citation.com/2009/12/common-mistakes-in-sql-server-part-4.html
There is a misconception that if we have the NULL values in a table it doesn't occupy storage space. The fact is, a NULL value occupies space – 2 bytes
SQL: Using NULL values vs. default values
A NULL value in databases is a system value that takes up one byte of storage and indicates that a value is not present as opposed to a space or zero or any other default value.
Can you please guide me regarding the size taken by null value.
If the field is fixed width storing NULL takes the same space as any other value - the width of the field.
If the field is variable width the NULL value takes up no space.
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
The reason for the discrepancies that you have observed in information from other sources:
The start of the first article is a bit misleading. The article is not talking about the cost of storing a NULL value, but the cost of having the ability to store a NULL (i.e the cost of making a column nullable). It's true that it costs something in storage space to make a column nullable, but once you have done that it takes less space to store a NULL than it takes to store a value (for variable width columns).
The second link seems to be a question about Microsoft Access. I don't know the details of how Access stores NULLs but I wouldn't be surprised if it is different to SQL Server.
The following link claims that if the column is variable length, i.e. varchar then NULL takes 0 bytes (plus 1 byte is used to flag whether value is NULL or not):
How does SQL Server really store NULL-s
The above link, as well as the below link, claim that for fixed length columns, i.e. char(10) or int, a value of NULL occupies the length of the column (plus 1 byte to flag whether it's NULL or not):
Data Type Performance Tuning Tips for Microsoft SQL Server
Examples:
If you set a char(10) to NULL, it occupies 10 bytes (zeroed out)
An int takes 4 bytes (also zeroed out).
A varchar(1 million) set to NULL takes 0 bytes (+ 2 bytes)
Note: on a slight tangent, the storage size of varchar is the length of data entered + 2 bytes.
From this link:
Each row has a null bitmap for columns
that allow nulls. If the row in that
column is null then a bit in the
bitmap is 1 else it's 0.
For variable size datatypes the
acctual size is 0 bytes.
For fixed size datatype the acctual
size is the default datatype size in
bytes set to default value (0 for
numbers, '' for chars).
Storing a NULL value does not take any space.
"The fact is, a NULL value occupies
space – 2 bytes."
This is a misconception -- that's 2 bytes per row, and I'm pretty sure that all rows use those 2 bytes regardless of whether there's any nullable columns.
A NULL value in databases is a system
value that takes up one byte of
storage
This is talking about databases in general, not specifically SQL Server. SQL Server does not use 1 byte to store NULL values.
Even though this questions is specifically tagged as SQL Server 2005, being that it is now 2021, it should be pointed out that it is a "trick question" for any version of SQL Server after 2005.
This is because if either ROW or PAGE compression are used, or if the column is defined as SPARSE, then it will "no space" in the actual row to store a 'NULL value'. These were added in SQL Server 2008.
The implementation notes for ROW COMPRESSION (which is a prerequisite for PAGE COMPRESSION) states:
NULL and 0 values across all data types are optimized and take no bytes1.
While there is still minimal metadata (4 bits per column + (record overhead / columns)) stored per non-sparse column in each physical record2, it's strictly not the value and is required in all cases3.
SPARSE columns with a NULL value take up no space and no relevant per-row metadata (as the number of SPARSE columns increase), albeit with a trade-off for non-NULL values.
As such, it is hard to "count" space without anlyzing the actual DB usage stats. The average bytes per row will vary based on precise column types, table/index rebuild settings, actual data and duplicity, fill capacity, effective page utilization, fragmentation, LOB usage, etc. and is often a more useful metric.
1 SQLite uses a similar approach to have effectively-free NULL values.
2 A brief of the technical layout used in ROW (and thus PAGE) compression can found in "SQL Server 2012 Internals: Special Storage".
Following the 1 or 2 bytes for the number of columns is the CD array, which uses 4 bits [of metadata] for each column in the table to represent information about the length of the column .. 0 (0×0) indicates that the corresponding column is NULL.
3 Fun fact: with ROW compression, bit column values exist entirely in the corresponding 4-bit metadata.

Set minimum and exact length in bytes of column values in SQL Server

How can one set the column value length to a minimum and/or exact length using MS SQL Management Studio?
For character datatypes (char/nchar/varchar/nvarchar), you set the length:
Remember that for char and nchar, you'll ALWAYS be allocated the x bytes space, and your string will be padded. nchar always has 2 x allocated, but the string length will be as you specify.
For string-length minimums, that really should be left to the application.
SQL Server does allow you to put a constraint on a column.
In this example, the constraint is that the column value be >= 10.
Simply write a TSQL expression in the Expression field.
LEN(myColumn) = 20 - the value being inserted/updated in the column is length of 20.
LEN(RTRIM(LRTIM(myColumn))) = 20 - the value being inserted/updated in the column is length of 20, with all leading/trailing whitespace removed.
Not sure what you are asking here. If you are asking how to make sure the user only enters a certain number of characters into a column of a table, SSMS is not the tool to do this. You would need to enforce this at the database level, through a check constraint, for example. Or through your application.

What is the value of NULL in SQL Server?

I know NULL is not zero... nor it is empty string. But then what is the value of NULL... which the system keeps, to identify it?
NULL is a special element of the SQL language that is neither equal to or unequal to any value in any data type.
As you said, NULL is not zero, an empty string, or false. I.e. false = NULL returns UNKNOWN.
Some people say NULL is not a value, it's a state. The state of having no value. Sort of like a Zen Koan. :-)
I don't know specifically how MS SQL Server stores it internally, but it doesn't matter, as long as they implement it according to the SQL standard.
I believe that for each column that allow nulls, the rows have a null bitmap. If the row in the specified column is null the bit in the bitmap is 1 otherwise is 0.
The SQL Server row format is described in MSDN, and also analyzed on various blogs, like Paul Randal's Anatomy of a Record. The important information is the record structure:
record header
4 bytes long
two bytes of record metadata (record type)
two bytes pointing forward in the record to the NULL bitmap
fixed length portion of the record, containing the columns storing data types that have fixed lengths (e.g. bigint, char(10), datetime)
NULL bitmap
two bytes for count of columns in the record
variable number of bytes to store one bit per column in the record, regardless of whether the column is nullable or not
this allows an optimization when reading columns that are NULL
variable-length column offset array
two bytes for the count of variable-length columns
two bytes per variable length column, giving the offset to the end of the column value
versioning tag
a 14-byte structure that contains a timestamp plus a pointer into the version store in tempdb
So NULL fields have a bit set in the NULL bitmap.
I doubt the interviewers wanted you to know exactly how SQL server stores nulls, the point of such a question is to get you to think about how you'd store special values. You can't use a sentinel value (or magic number), as that would make any rows with that value in them suddenly become null.
There are quite a few ways to achieve this. The most straightforward 2 that come to mind are to have a flag stored with each nullable value that is basically an isNull flag (this is also basically how Nullable<T> works in .NET). A 2nd method is to store with each row a bitmap of null flags, one for each column.
When faced with such an interview questions the absolute worst response is to sit and stare blankly. Think out loud some, admit that you don't know how SQL Server does it, and then present some reasonable sounding ways of doing it. You should also be ready to talk a bit about why you'd pick one method over another, and what the pluses and minuses of each are.
As Bill said, it's a state not a value.
In a row in a table SQL Server it's stored in the null bitmap: no value is actually stored.
One quirk of SQL Server:
SELECT TOP 1 NULL AS foo INTO dbo.bar FROM sys.columns
What datatype is foo? It has no meaning of course, but it caught me out once.
Conceptually, NULL means “a missing unknown value” and it is treated somewhat differently from other values.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Two NULL values are regarded as equal in a GROUP BY.
When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
The value of NULL means in essence a missing value, some people will also use the term unknow
A Null value is also not equal to anything not even another NULL
Take a look at these examples
will print is equal
IF 1 = 1
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is equal (an implicit conversion happens)
IF 1 = '1'
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is not equal
IF NULL = NULL
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'

Resources