When does the space get allocated - sql-server

If I declare a column as nvarchar(max), I understand that this will allocate 2Gb of space, but does it actually take the 2Gb, of disk space, straight away once I save the changes to the table? Or, is it that it makes note that this column will allow 2Gb of data to be populated in the column?

As I understand it, space isn't allocated until it is needed.
Try the following queries:
CREATE TABLE SizeTest (
MyID int primary key
)
INSERT INTO SizeTest SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
EXEC sp_spaceused 'SizeTest'
ALTER TABLE SizeTest ADD MyBigText nvarchar(max)
EXEC sp_spaceused 'SizeTest'
UPDATE SizeTest SET MyBigText = 'This is big text' WHERE MyID = 1
EXEC sp_spaceused 'SizeTest'
DROP TABLE SizeTest
By executing this statement, you should get the following for all three sp_spaceused calls:
name rows reserved data index_size unused
SizeTest 5 16 KB 8 KB 8 KB 0 KB
At no point is the 2GB allocated.

The maximum storage size for VARCHAR(MAX) is 2^31-1 bytes (2,147,483,647 bytes or 2GB - 1 bytes). The storage size is the actual length of data entered + 2 bytes. The data entered can be 0 characters in length. Since each character in a VARCHAR data type uses one byte, the maximum length for a VARCHAR(MAX) data type is 2,147,483,645.
The maximum storage size for NVARCHAR(MAX) is also 2^31-1 bytes (2,147,483,647 bytes or 2GB - 1 bytes). The storage size, in bytes, is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length. Since each Unicode character in an NVARCHAR data type uses two bytes, the maximum length for an NVARCHAR(MAX) data type is 1,073,741,822.
The maximum storage size for VARBINARY(MAX) is the same as the maximum storage size for VARCHAR(MAX) and NVARCHAR(MAX), which is 2^31-1 (2,147,483,647 bytes or 2GB - 1 bytes). The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length.

Will allow 2Gb of data :) I think so.. It allows until 2Gb but it will not allocate this space straight away

Related

Space consumed by a particular column's data and impact on deleting that column

I am using Oracle 12c database in my project and I have a column "Name" of type "VARCHAR2(128 CHAR) NOT NULL ". I have approximately 25328687 rows in my table.
Now I don't need the "Name" column so I want to delete it. When I calculated the total size of the data in this column(using lengthb and vsize) for all the rows it was approximately 1.07 GB.
Since the max size of the data in this column is specified, isn't all the rows will be allocated 128 bytes for this column (ignoring unicode for simplicity) and the total space consumed by this column should be 128 * number of rows = 3242071936 bytes or 3.24 GB.
Oracle Varchar2 allocate memory dynamically (definition says variable length string data type)
Char datatype is fixed length string data type.
create table x (a char(5), b varchar2(5));
insert into x value ('RAM', 'RAM');
insert into x value ('RAMA', 'RAMA');
insert into x value ('RAMAN', 'RAMAN');
SELECT * FROM X WHERE length(a) = 3; -> this will return 0 record
SELECT * FROM X WHERE length(b) = 3; -> this will return 1 record (RAM)
SELECT length(a) len_a, length(b) len_b from x ;
o/p will be like below
len_a | len_b
-------------
5 | 3
5 | 4
5 | 5
Oracle do dynamic allocation for varchar2 .
So a string of 4 char will take 5 bytes one for the length and 4 bytes for 4 char , if one-byte character set .
As the other answers say, the storage that a VARCHAR2 column uses is VARying. To get an estimate of the actual amount, you can use
1) The data dictionary
SELECT column_name, avg_col_len, last_analyzed
FROM ALL_TAB_COL_STATISTICS
WHERE owner = 'MY_SCHEMA'
AND table_name = 'MY_TABLE'
AND column_name = 'MY_COLUMN';
The result avg_col_len is the average column length. Mulitply it by your number of rows 25328687 and you get an estimate of roughly how many bytes this column uses. (If last_analyzed is NULL or very old compared to the last big data change, you'll have to refresh the optimizer stats with DBMS_STATS.GATHER_TABLE_STATS('MY_SCHEMA','MY_TABLE') first.
2) Count yourself in sample
SELECT sum(s), count(*), avg(s), stddev(s)
FROM (
SELECT vsize(my_column) as s
FROM my_schema.my_table SAMPLE (0.1)
);
This calculates the storage size of a 0.1 percent sample of your table.
3) To know for sure, I'd do a test of with a subset of the data
CREATE TABLE my_test TABLESPACE my_scratch_tablespace NOLOGGING AS
SELECT * FROM my_schema.my_table SAMPLE (0.1);
-- get the size of the test table in megabytes
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- now drop the column
ALTER TABLE my_test DROP (my_column);
-- and measure again
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- check how much space will be freed up
ALTER TABLE my_test MOVE;
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
You could improve the test by using the same PCTFREE and COMPRESSION levels on your test table.

How is nvarchar(n) stored in SQL Server?

Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).

SQL Server encryptbycert capping off

My problem is that I'm trying to encrypt a column in a SQL Server database because of policies of my work place. I have access only to simple methods for encrypting (TDE seems out of my possibilities) so I've tried using EncryptByCert or EncryptByKey. I was doing fine since the documentation shows the cap at 8000 which is enough for the data we're saving.
It just so happens that when I try to save anything it caps off at around 200 characters generating a 514 byte long varbinary. The 514 byte length varbinary will encrypt and decrypt fine but will not grow or shorten, a single character counts the same as a 200 string making those same 514 bytes binary. After say around 230 characters that I want to encrypt it will just leave the column null.
Does anyone know what's happening with that?
Encryption performed by these methods is done in chunks, with the maximum chunk size is the key length minus some internal overhead (117 bytes for 1024 bit keys, and 245 bytes for 2048 bit keys first introduced in SQL Server 2016).
If your input is any larger than that, you have to split it into chunks and encrypt one at a time, then concatenate the result.
Decryption, of course, should be performed accordingly. However, an important difference between the two is that encryption chunk size will be smaller than the key, and for decryption it should be exactly the key size. That's because any data, however short, will be encrypted into key long chunk, so that no guesses on the input length can be made by looking at the output.
Here is an excerpt from my encryption function (written for 2012 version, so 1024 bit keys are assumed):
create function [dbo].[security_EncryptByCert]
(
#ClearText varbinary(max)
)
returns varbinary(max) with schemabinding, returns null on null input as begin
-- Returned value
declare #Ret varbinary(max) = 0x,
-- Length of the ciphertext
#Lng int = datalength(#ClearText),
-- Starting offset of the chunk to encrypt
#i int = 1,
-- Chunk size, currently it can't be more than 117 bytes
#Size int = 100,
-- Certificate to encrypt data with
#CertId int;
-- Determine the certificate with which to perform encryption
select #CertId = Id from ...
-- Iterate chunk by chunk til the end of the text
while #i < #Lng begin
set #Ret += encryptbycert(#CertId, substring(#ClearText, #i, #Size));
-- Move the pointer to the next block
set #i += #Size;
end;
return #Ret;
end;
In this case, I used 100 byte chunks, not the largest possible ones. Don't really remember why, but you can use 245 bytes as a limit on 2016.

What is the maximum characters for the NVARCHAR(MAX)? [duplicate]

This question already has answers here:
What is the maximum number of characters that nvarchar(MAX) will hold?
(3 answers)
Closed 1 year ago.
I have declared a column of type NVARCHAR(MAX) in SQL Server 2008, what would be its exact maximum characters having the MAX as the length?
The max size for a column of type NVARCHAR(MAX) is 2 GByte of storage.
Since NVARCHAR uses 2 bytes per character, that's approx. 1 billion characters.
Leo Tolstoj's War and Peace is a 1'440 page book, containing about 600'000 words - so that might be 6 million characters - well rounded up. So you could stick about 166 copies of the entire War and Peace book into each NVARCHAR(MAX) column.
Is that enough space for your needs? :-)
By default, nvarchar(MAX) values are stored exactly the same as nvarchar(4000) values would be, unless the actual length exceed 4000 characters; in that case, the in-row data is replaced by a pointer to one or more seperate pages where the data is stored.
If you anticipate data possibly exceeding 4000 character, nvarchar(MAX) is definitely the recommended choice.
Source: https://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/d5e0c6e5-8e44-4ad5-9591-20dc0ac7a870/
From MSDN Documentation
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB).
The storage size, in bytes, is two times the actual length of data entered + 2 bytes
I think actually nvarchar(MAX) can store approximately 1070000000 chars.

Does a char occupy 1 byte in a database?

Does a char occupy 1 byte in a database?
EDIT:
If I define a column as varchar(1), will it reserve 1 or 2 bytes for me?
Char(k) takes k-bytes no matter what the value is,
varchar(k) n+1 bytes, where n = number of chars in the value, but max k+1 bytes
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
http://dev.mysql.com/doc/refman/5.1/en/char.html
depends on what kind of char is it. if type of string is char/varchar then 1 byte if unicode: nchar/nvarchar then most probably 2 bytes.
It depends on the RDBMS system, and how you define the column. You certainly could define one that only requires one byte of storage space [in SQL Server, it'd be CHAR(1) ]. Overhead for row headers, null bitmasks, possibly indexing uniquefication, and lots of other cruft can complicate things, but yeah, you should be able to create a column that's one byte wide.
Yes, if you specify the length of the char field as one, and the database is using a codepage based character mapping so that each character is represented as one byte.
If the database for example is set up to use UTF-8 for storing characters, each character will take anything from one to five bytes depending on what character it is.
However, the char data type is rather old, some databases may actually store a char(1) fields the same way as a varchar(1) field. In that case the field will also need a length, so it will take up at least one or two bytes depending on whether it's a space that you store in the field (which will be stored as an empty string), maybe more depending on the database.

Resources