I know NULL is not zero... nor it is empty string. But then what is the value of NULL... which the system keeps, to identify it?
NULL is a special element of the SQL language that is neither equal to or unequal to any value in any data type.
As you said, NULL is not zero, an empty string, or false. I.e. false = NULL returns UNKNOWN.
Some people say NULL is not a value, it's a state. The state of having no value. Sort of like a Zen Koan. :-)
I don't know specifically how MS SQL Server stores it internally, but it doesn't matter, as long as they implement it according to the SQL standard.
I believe that for each column that allow nulls, the rows have a null bitmap. If the row in the specified column is null the bit in the bitmap is 1 otherwise is 0.
The SQL Server row format is described in MSDN, and also analyzed on various blogs, like Paul Randal's Anatomy of a Record. The important information is the record structure:
record header
4 bytes long
two bytes of record metadata (record type)
two bytes pointing forward in the record to the NULL bitmap
fixed length portion of the record, containing the columns storing data types that have fixed lengths (e.g. bigint, char(10), datetime)
NULL bitmap
two bytes for count of columns in the record
variable number of bytes to store one bit per column in the record, regardless of whether the column is nullable or not
this allows an optimization when reading columns that are NULL
variable-length column offset array
two bytes for the count of variable-length columns
two bytes per variable length column, giving the offset to the end of the column value
versioning tag
a 14-byte structure that contains a timestamp plus a pointer into the version store in tempdb
So NULL fields have a bit set in the NULL bitmap.
I doubt the interviewers wanted you to know exactly how SQL server stores nulls, the point of such a question is to get you to think about how you'd store special values. You can't use a sentinel value (or magic number), as that would make any rows with that value in them suddenly become null.
There are quite a few ways to achieve this. The most straightforward 2 that come to mind are to have a flag stored with each nullable value that is basically an isNull flag (this is also basically how Nullable<T> works in .NET). A 2nd method is to store with each row a bitmap of null flags, one for each column.
When faced with such an interview questions the absolute worst response is to sit and stare blankly. Think out loud some, admit that you don't know how SQL Server does it, and then present some reasonable sounding ways of doing it. You should also be ready to talk a bit about why you'd pick one method over another, and what the pluses and minuses of each are.
As Bill said, it's a state not a value.
In a row in a table SQL Server it's stored in the null bitmap: no value is actually stored.
One quirk of SQL Server:
SELECT TOP 1 NULL AS foo INTO dbo.bar FROM sys.columns
What datatype is foo? It has no meaning of course, but it caught me out once.
Conceptually, NULL means “a missing unknown value” and it is treated somewhat differently from other values.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Two NULL values are regarded as equal in a GROUP BY.
When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
The value of NULL means in essence a missing value, some people will also use the term unknow
A Null value is also not equal to anything not even another NULL
Take a look at these examples
will print is equal
IF 1 = 1
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is equal (an implicit conversion happens)
IF 1 = '1'
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
will print is not equal
IF NULL = NULL
PRINT 'is equaL'
ELSE
PRINT 'NOT equal'
Related
The Microsoft Documentation at https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-2017 says:
An integer data type that can take a value of 1, 0, or NULL.
The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.
The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.
Converting to bit promotes any nonzero value to 1.
How is it possible to store 1, 0 and NULL in a single bit?
Quoting a canonical answer by #MarkByers in the question How much size “Null” value takes in SQL Server regarding how SQL Server stores NULL in general:
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
So, I would expect the BIT type to behave the same as any other column, meaning that there would be a separate bit to keep track of whether the column be NULL or not NULL. Therefore, a BIT column in SQL Server actually uses two bits to keep track of the three values.
There is a NULL bitmap mask in the row header that keeps track of what columns is null or not.
It seems to be really hard to find accurate information about this. MSDN has a article about sparse columns, and which null percentage thresholds should be considered when using them. But the facts concerning default null storage space usage seem to be very difficult to come by.
Some sources claim that NULL values take no space whatsoever, but that would mean that sparse columns would be pointless in the first place. Some claim that only the null bitmap in the table definition adds a bit representing each nullable column, but that there's no further overhead. Some claim that fixed-length columns (char, int, bigint etc) actually use up the same amount of storage space regardless of whether the value is null or not.
So which is it, really?
Let's say I have a list of all the nullable columns in our DB with total rows in the table, and the number of NULL rows per each column and type. How would I calculate exactly how much space the NULL values are using now, so I could then predict exactly how much space is saved by altering the columns to sparse instead? I can add the 4 byte overhead to the non-null rows just fine, but it doesn't help when I have no idea what to do with the null rows?
For fixed length types such as int NULL, it always use the length of the type (ie 4 bytes for int whether it is set to NULL or NOT NULL).
For variable length types, it takes 0 bytes to store the NULL + 2 bytes in the variable length columns offset list. This is used to record where each variable length value is really stored in the row on the page.
In addition, the NULL or NOT NULL flag uses 1 bit for each columns. A table with 12 columns will use 12/8 bytes (=2 bytes NULL bitmap).
This link will give you a lot more information on the subject
Once you know the percentage of NULL, you can look at this link for an estimate of the potential gain. Sparse saves space on null value but will requieres more space for not null values.
What is the best way to store the following value in SQL Server ?
1234-56789 or
4567-12892
The value will always have 4 digits followed by a hyphen and 5 digits
char(10) is a possibility that I was thinking of using or removing the hyphen and storing as int
If it is a business requirement to have "The value will always have 4 digits followed by a hypen and 5 digits" Then CHAR(10) but if you think Users should be able to add values even if isnt in the expected format then VARCHAR(10) or VARCHAR(15) whatever suits you better.
You should store those kind of values as int only if really represents a number as opposed to a series of digits. Number means something that you can make calculations on, compare are numbers, etc.
Otherwise store it as char. Make it length of 10 if the format is set and won't change.
Another option would be to create a CHAR(4) column and a CHAR(5) column. This would be useful (only) if you envision ever having to query against one or the other part independently.
Very easy to concatenate these back together using a view, computed column, or inline - so you don't have to waste storage space on a dash that will always be there, and so that you can keep these two pieces of data separate if, in fact, they are independent.
Since you didn't provide much detail about what these "numbers" represent or how they will be used / queried, you're going to get a whole bunch of opinions, some of which might not be very relevant to your data model.
Well, if it's guaranteed to always be like that, a char(10) datatype seems appropriate.
But you should also add a check constraint:
column LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]'
Here is a SO answer that should help you sort out what you need -
nchar and nvarchar can store Unicode characters.
char and varcharcannot store Unicode characters.
char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don't use up all that space.
varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar.
I have a large table with say 10 columns. 4 of them remains null most of the times. I have a query that does null value takes any size or no size in bytes. I read few articles some of them are saying :
http://www.sql-server-citation.com/2009/12/common-mistakes-in-sql-server-part-4.html
There is a misconception that if we have the NULL values in a table it doesn't occupy storage space. The fact is, a NULL value occupies space – 2 bytes
SQL: Using NULL values vs. default values
A NULL value in databases is a system value that takes up one byte of storage and indicates that a value is not present as opposed to a space or zero or any other default value.
Can you please guide me regarding the size taken by null value.
If the field is fixed width storing NULL takes the same space as any other value - the width of the field.
If the field is variable width the NULL value takes up no space.
In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.
The reason for the discrepancies that you have observed in information from other sources:
The start of the first article is a bit misleading. The article is not talking about the cost of storing a NULL value, but the cost of having the ability to store a NULL (i.e the cost of making a column nullable). It's true that it costs something in storage space to make a column nullable, but once you have done that it takes less space to store a NULL than it takes to store a value (for variable width columns).
The second link seems to be a question about Microsoft Access. I don't know the details of how Access stores NULLs but I wouldn't be surprised if it is different to SQL Server.
The following link claims that if the column is variable length, i.e. varchar then NULL takes 0 bytes (plus 1 byte is used to flag whether value is NULL or not):
How does SQL Server really store NULL-s
The above link, as well as the below link, claim that for fixed length columns, i.e. char(10) or int, a value of NULL occupies the length of the column (plus 1 byte to flag whether it's NULL or not):
Data Type Performance Tuning Tips for Microsoft SQL Server
Examples:
If you set a char(10) to NULL, it occupies 10 bytes (zeroed out)
An int takes 4 bytes (also zeroed out).
A varchar(1 million) set to NULL takes 0 bytes (+ 2 bytes)
Note: on a slight tangent, the storage size of varchar is the length of data entered + 2 bytes.
From this link:
Each row has a null bitmap for columns
that allow nulls. If the row in that
column is null then a bit in the
bitmap is 1 else it's 0.
For variable size datatypes the
acctual size is 0 bytes.
For fixed size datatype the acctual
size is the default datatype size in
bytes set to default value (0 for
numbers, '' for chars).
Storing a NULL value does not take any space.
"The fact is, a NULL value occupies
space – 2 bytes."
This is a misconception -- that's 2 bytes per row, and I'm pretty sure that all rows use those 2 bytes regardless of whether there's any nullable columns.
A NULL value in databases is a system
value that takes up one byte of
storage
This is talking about databases in general, not specifically SQL Server. SQL Server does not use 1 byte to store NULL values.
Even though this questions is specifically tagged as SQL Server 2005, being that it is now 2021, it should be pointed out that it is a "trick question" for any version of SQL Server after 2005.
This is because if either ROW or PAGE compression are used, or if the column is defined as SPARSE, then it will "no space" in the actual row to store a 'NULL value'. These were added in SQL Server 2008.
The implementation notes for ROW COMPRESSION (which is a prerequisite for PAGE COMPRESSION) states:
NULL and 0 values across all data types are optimized and take no bytes1.
While there is still minimal metadata (4 bits per column + (record overhead / columns)) stored per non-sparse column in each physical record2, it's strictly not the value and is required in all cases3.
SPARSE columns with a NULL value take up no space and no relevant per-row metadata (as the number of SPARSE columns increase), albeit with a trade-off for non-NULL values.
As such, it is hard to "count" space without anlyzing the actual DB usage stats. The average bytes per row will vary based on precise column types, table/index rebuild settings, actual data and duplicity, fill capacity, effective page utilization, fragmentation, LOB usage, etc. and is often a more useful metric.
1 SQLite uses a similar approach to have effectively-free NULL values.
2 A brief of the technical layout used in ROW (and thus PAGE) compression can found in "SQL Server 2012 Internals: Special Storage".
Following the 1 or 2 bytes for the number of columns is the CD array, which uses 4 bits [of metadata] for each column in the table to represent information about the length of the column .. 0 (0×0) indicates that the corresponding column is NULL.
3 Fun fact: with ROW compression, bit column values exist entirely in the corresponding 4-bit metadata.
In sql server does it make a difference if I define a varchar column to be of length 32 or 128?
A varchar is a variable character field. This means it can hold text data to a certain length. A varchar(32) can only hold 32 characters, whereas a varchar(128) can hold 128 characters. If I tried to input "12345" into a varchar(3) field; this is the data that will be stored:
"123"
The "45" will be "truncated" (lost).
They are very useful in instances where you know that a certain field will only be (or only should be) a certain length at maximum. For example: a zip code or state abbreviation. In fact, they are generally used for almost all types of text data (such as names/addresses/et cetera) - but in these instances you must be careful that the number you supply is a sane maximum for the type of data that will fill that column.
However, you must also be careful when using them to only allow the user to input the maximum amount of characters that the field will support. Otherwise it may lend to confusion when it truncates the user's input.
There should be no noticeable difference as the backend will only store the amount of data you insert into that column. It's not padded out to the full size of the field like it is with a char column.
Edit: For more info, this link should be useful.
It should not. It just defines the maximum length it can accommodate, the actual used length depends on the data inserted.