Netezza: ERROR: 65536 : Record size limit exceeded - netezza

Can someone please explain below behavior
KAP.ADMIN(ADMIN)=> create table char1 ( a char(64000),b char(1516));
CREATE TABLE
KAP.ADMIN(ADMIN)=> create table char2 ( a char(64000),b char(1517));
ERROR: 65536 : Record size limit exceeded
KAP.ADMIN(ADMIN)=> insert into char1 select * from char1;
ERROR: 65540 : Record size limit exceeded => why this error during
insert if create table does not throw any error for same table as
shown above.
KAP.ADMIN(ADMIN)=> \d char1
Table "CHAR1"
Attribute | Type | Modifier | Default Value
-----------+------------------+----------+---------------
A | CHARACTER(64000) | |
B | CHARACTER(1516) | |
Distributed on hash: "A"
./nz_ddl_table KAP char1
Creating table: "CHAR1"
CREATE TABLE CHAR1
(
A character(64000),
B character(1516)
)
DISTRIBUTE ON (A)
;
/*
Number of columns 2
(Variable) Data Size 4 - 65520
Row Overhead 28
====================== =============
Total Row Size (bytes) 32 - 65548
*/
I would like to know the calculation of row size in above case.
I checked the netezza db user guide, but not able to understand its calculation in above example.

I think this link does a good job of explaining the over head of Netezza / PDA Datatypes:
For every row of every table, there is a 24-byte fixed overhead of the rowid, createxid, and deletexid. If you have any nullable columns, a null vector is required and it is N/8 bytes where N is the number of columns in the record.
The system rounds up the size of
this header to a multiple of 4 bytes.
In addition, the system adds a record header of 4 bytes if any of the following is true:
Column of type VARCHAR
Column of type CHAR where the length is greater than 16 (stored internally as VARCHAR)
Column of type NCHAR
Column of type NVARCHAR
Using UTF-8 encoding, each Unicode code point can require 1 - 4 bytes of storage. A 10-character string requires 10 bytes of storage if it is ASCII and up to 20 bytes if it is Latin, or as many as 40 bytes if it is Kanji.
The only time a record does not contain a header is if all the columns are defined as NOT NULL, there are no character data types larger than 16 bytes, and no variable character data types.
https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_data_types_calculate_row_size.html

First create a temp table based on one row of data.
create temp table tmptable as
select *
from Table
limit 1
Then check the used bytes of the temp table. That should be the size per row.
select used_bytes
from _v_sys_object_storage_size a inner join
_v_table b
on a.tblid = b.objid
and b.tablename = 'tmptable'
Netezza has some Limitations:
1)Maximum number of characters in a char/varchar field: 64,000
2)Maximum row size: 65,535 bytes
Beyond 65 k bytes is impossible for a record length in NZ.
Though NZ box offers huge space, it would be really good idea to move with accurate space forecasting rather radomly spacing. Now in your requirement does all the attributes would mandatorily require a char(64000) or can be compacted with real-time data analysis. If further compacting can be done, then revisit on the attribute length .
Also during such requirements, never go with insert into char1 select * ....... statements because this will allow system to choose preferred datatypes and that will be of higher sizing ends which might not be necessary.

Related

Space consumed by a particular column's data and impact on deleting that column

I am using Oracle 12c database in my project and I have a column "Name" of type "VARCHAR2(128 CHAR) NOT NULL ". I have approximately 25328687 rows in my table.
Now I don't need the "Name" column so I want to delete it. When I calculated the total size of the data in this column(using lengthb and vsize) for all the rows it was approximately 1.07 GB.
Since the max size of the data in this column is specified, isn't all the rows will be allocated 128 bytes for this column (ignoring unicode for simplicity) and the total space consumed by this column should be 128 * number of rows = 3242071936 bytes or 3.24 GB.
Oracle Varchar2 allocate memory dynamically (definition says variable length string data type)
Char datatype is fixed length string data type.
create table x (a char(5), b varchar2(5));
insert into x value ('RAM', 'RAM');
insert into x value ('RAMA', 'RAMA');
insert into x value ('RAMAN', 'RAMAN');
SELECT * FROM X WHERE length(a) = 3; -> this will return 0 record
SELECT * FROM X WHERE length(b) = 3; -> this will return 1 record (RAM)
SELECT length(a) len_a, length(b) len_b from x ;
o/p will be like below
len_a | len_b
-------------
5 | 3
5 | 4
5 | 5
Oracle do dynamic allocation for varchar2 .
So a string of 4 char will take 5 bytes one for the length and 4 bytes for 4 char , if one-byte character set .
As the other answers say, the storage that a VARCHAR2 column uses is VARying. To get an estimate of the actual amount, you can use
1) The data dictionary
SELECT column_name, avg_col_len, last_analyzed
FROM ALL_TAB_COL_STATISTICS
WHERE owner = 'MY_SCHEMA'
AND table_name = 'MY_TABLE'
AND column_name = 'MY_COLUMN';
The result avg_col_len is the average column length. Mulitply it by your number of rows 25328687 and you get an estimate of roughly how many bytes this column uses. (If last_analyzed is NULL or very old compared to the last big data change, you'll have to refresh the optimizer stats with DBMS_STATS.GATHER_TABLE_STATS('MY_SCHEMA','MY_TABLE') first.
2) Count yourself in sample
SELECT sum(s), count(*), avg(s), stddev(s)
FROM (
SELECT vsize(my_column) as s
FROM my_schema.my_table SAMPLE (0.1)
);
This calculates the storage size of a 0.1 percent sample of your table.
3) To know for sure, I'd do a test of with a subset of the data
CREATE TABLE my_test TABLESPACE my_scratch_tablespace NOLOGGING AS
SELECT * FROM my_schema.my_table SAMPLE (0.1);
-- get the size of the test table in megabytes
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- now drop the column
ALTER TABLE my_test DROP (my_column);
-- and measure again
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- check how much space will be freed up
ALTER TABLE my_test MOVE;
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
You could improve the test by using the same PCTFREE and COMPRESSION levels on your test table.

Is it possible to determine ENCRYPTBYKEY maximum returned value by the clear text type?

I am going to encrypted several fields in existing table. Basically, the following encryption technique is going to be used:
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'sm_long_password#'
GO
CREATE CERTIFICATE CERT_01
WITH SUBJECT = 'CERT_01'
GO
CREATE SYMMETRIC KEY SK_01
WITH ALGORITHM = AES_256 ENCRYPTION
BY CERTIFICATE CERT_01
GO
OPEN SYMMETRIC KEY SK_01 DECRYPTION
BY CERTIFICATE CERT_01
SELECT ENCRYPTBYKEY(KEY_GUID('SK_01'), 'test')
CLOSE SYMMETRIC KEY SK_01
DROP SYMMETRIC KEY SK_01
DROP CERTIFICATE CERT_01
DROP MASTER KEY
The ENCRYPTBYKEY returns varbinary with a maximum size of 8,000 bytes. Knowing the table fields going to be encrypted (for example: nvarchar(128), varchar(31), bigint) how can I define the new varbinary types length?
You can see the full specification here
So lets calculate:
16 byte key UID
_4 bytes header
16 byte IV (for AES, a 16 byte block cipher)
Plus then the size of the encrypted message:
_4 byte magic number
_2 bytes integrity bytes length
_0 bytes integrity bytes (warning: may be wrongly placed in the table)
_2 bytes (plaintext) message length
_m bytes (plaintext) message
CBC padding bytes
The CBC padding bytes should be calculated the following way:
16 - ((m + 4 + 2 + 2) % 16)
as padding is always applied. This will result in a number of padding bytes in the range 1..16. A sneaky shortcut is to just add 16 bytes to the total, but this may mean that you're specifying up to 15 bytes that are never used.
We can shorten this to 36 + 8 + m + 16 - ((m + 8) % 16) or 60 + m - ((m + 8) % 16. Or if you use the little trick specified above and you don't care about the wasted bytes: 76 + m where m is the message input.
Notes:
beware that the first byte in the header contains the version number of the scheme; this answer does not and cannot specify how many bytes will be added or removed if a different internal message format or encryption scheme is used;
using integrity bytes is highly recommended in case you want to protect your DB fields against change (keeping the amount of money in an account confidential is less important than making sure the amount cannot be changed).
The example on the page assumes single byte encoding for text characters.
Based upon some tests in SQL Server 2008, the following formula seems to work. Note that #ClearText is VARCHAR():
52 + (16 * ( ((LEN(#ClearText) + 8)/ 16) ) )
This is roughly compatible with the answer by Maarten Bodewes, except that my tests showed the DATALENGTH(myBinary) to always be of the form 52 + (z * 16), where z is an integer.
LEN(myVarCharString) DATALENGTH(encryptedString)
-------------------- -----------------------------------------
0 through 7 usually 52, but occasionally 68 or 84
8 through 23 usually 68, but occasionally 84
24 through 39 usually 84
40 through 50 100
The "myVarCharString" was a table column defined as VARCHAR(50). The table contained 150,000 records. The mention of "occasionally" is an instance of about 1 out of 10,000 records that would get bumped into a higher bucket; very strange. For LEN() of 24 and higher, there were not enough records to get the weird anomaly.
Here is some Perl code that takes a proposed length for "myVarCharString" as input from the terminal and produces an expected size for the EncryptByKey() result. The function "int()" is equivalent to "Math.floor()".
while($len = <>) {
print 52 + ( 16 * int( ($len+8) / 16 ) ),"\n";
}
You might want to use this formula to calculate a size, then add 16 to allow for the anomaly.

How is nvarchar(n) stored in SQL Server?

Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).

Calculate storage space used by sql server sql_variant data type to store fixed length data types

I'm trying to calculate the storage space used by sql_variant to store fixed length data types.
For my test I created a table with two columns:
Key int identitiy(1,1) primary key
Value sql_variant
I added one row with Value 1 of type int and I used DBCC PAGE to check the size of the row, that turned out being 21 bytes.
Using Estimate the Size of a Clustered Index I have:
Null_bitmap = 3
Fixed_Data_Size = 4 (Key column int)
Variable_Data_Size = 2 + 2 + 4 (Value column with an int)
Row_Size = 4 + 8 + 3 + 4 = 19 bytes
Why does the row take 21 bytes? What am I missing in my calculation?
I tried the same analysis with a table using an int column instead of the sql_variant and the used byte count reported by DBCC PAGE is 15, which match my calculation:
Null_bitmap = 3
Fixed_Data_Size = 8 (Key column int, Value column int)
Variable_Data_Size = 0
Row_Size = 4 + 8 + 3 = 15 bytes
The extra space is the sql_variant metadata information. From the BOL:
http://msdn.microsoft.com/en-us/library/ms173829.aspx
*Each instance of a sql_variant column records the data value and the metadata information. This includes the base data type, maximum size, scale, precision, and collation.
For compatibility with other data types, the catalog objects, such as the DATALENGTH function, that report the length of sql_variant objects report the length of the data. The length of the metadata that is contained in a sql_variant object is not returned.*
You missed part 7.
7 . Calculate the number of rows per page (8096 free bytes per page):
Rows_Per_Page = 8096 / (Row_Size + 2)
Because rows do not span pages, the number of rows per page should be
rounded down to the nearest whole row. The value 2 in the formula is
for the row's entry in the slot array of the page.

Should I use an inline varchar(max) column or store it in a separate table?

I want to create a table in MS SQL Server 2005 to record details of certain system operations. As you can see from the table design below, every column apart from Details is is non nullable.
CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[Details] [varchar](max) NULL
)
Because the Details column won't always have data in it. Is it more efficient to store this column in a separate table and provide a link to it instead?
CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[DetailID] [int] NULL
)
CREATE TABLE [Detail]
(
[DetailID] [int] IDENTITY(1,1) NOT NULL,
[Details] [varchar](max) NOT NULL
)
For a smaller data type I wouldn't really consider it, but for a varchar(max) does doing this help keep the table size smaller? Or I am just trying to out smart the database and achieving nothing?
Keep it inline. Under the covers SQL Server already stores the MAX columns in a separate 'allocation unit' since SQL 2005. See Table and Index Organization. This in effect is exactly the same as keeping the MAX column in its own table, but w/o any disadvantage of explicitly doing so.
Having an explicit table would actually be both slower (because of the foreign key constraint) and consume more space (because of the DetaiID duplication). Not to mention that it requires more code, and bugs are introduced by... writing code.
alt text http://i.msdn.microsoft.com/ms189051.3be61595-d405-4b30-9794-755842d7db7e(en-us,SQL.100).gif
Update
To check the actual location of data, a simple test can show it:
use tempdb;
go
create table a (
id int identity(1,1) not null primary key,
v_a varchar(8000),
nv_a nvarchar(4000),
m_a varchar(max),
nm_a nvarchar(max),
t text,
nt ntext);
go
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('v_a', N'nv_a', 'm_a', N'nm_a', 't', N'nt');
go
select %%physloc%%,* from a
go
The %%physloc%% pseudo column will show the actual physical location of the row, in my case it was page 200:
dbcc traceon(3604)
dbcc page(2,1, 200, 3)
Slot 0 Column 2 Offset 0x19 Length 3 Length (physical) 3
v_a = v_a
Slot 0 Column 3 Offset 0x1c Length 8 Length (physical) 8
nv_a = nv_a
m_a = [BLOB Inline Data] Slot 0 Column 4 Offset 0x24 Length 3 Length (physical) 3
m_a = 0x6d5f61
nm_a = [BLOB Inline Data] Slot 0 Column 5 Offset 0x27 Length 8 Length (physical) 8
nm_a = 0x6e006d005f006100
t = [Textpointer] Slot 0 Column 6 Offset 0x2f Length 16 Length (physical) 16
TextTimeStamp = 131137536 RowId = (1:182:0)
nt = [Textpointer] Slot 0 Column 7 Offset 0x3f Length 16 Length (physical) 16
TextTimeStamp = 131203072 RowId = (1:182:1)
All column values but the TEXT and NTEXT were stored inline, including the MAX types.
After changing the table options and insert a new row (sp_tableoption does not affect existing rows), the MAX types were evicted into their own storage:
sp_tableoption 'a' , 'large value types out of row', '1';
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('2v_a', N'2nv_a', '2m_a', N'2nm_a', '2t', N'2nt');
dbcc page(2,1, 200, 3);
Note how m_a and nm_a columns are now a Textpointer into the LOB allocation unit:
Slot 1 Column 2 Offset 0x19 Length 4 Length (physical) 4
v_a = 2v_a
Slot 1 Column 3 Offset 0x1d Length 10 Length (physical) 10
nv_a = 2nv_a
m_a = [Textpointer] Slot 1 Column 4 Offset 0x27 Length 16 Length (physical) 16
TextTimeStamp = 131268608 RowId = (1:182:2)
nm_a = [Textpointer] Slot 1 Column 5 Offset 0x37 Length 16 Length (physical) 16
TextTimeStamp = 131334144 RowId = (1:182:3)
t = [Textpointer] Slot 1 Column 6 Offset 0x47 Length 16 Length (physical) 16
TextTimeStamp = 131399680 RowId = (1:182:4)
nt = [Textpointer] Slot 1 Column 7 Offset 0x57 Length 16 Length (physical) 16
TextTimeStamp = 131465216 RowId = (1:182:5)
For completion sakeness we can also force the one of the non-max fields out of row:
update a set v_a = replicate('X', 8000);
dbcc page(2,1, 200, 3);
Note how the v_a column is stored in the Row-Overflow storage:
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
v_a = [BLOB Inline Root] Slot 0 Column 2 Offset 0x19 Length 24 Length (physical) 24
Level = 0 Unused = 99 UpdateSeq = 1
TimeStamp = 1098383360
Link 0
Size = 8000 RowId = (1:176:0)
So, as other have already commented, the MAX types are stored inline by default, if they fit. For many DW projects this would be unnacceptable because the typical DW loads must scan or at least range scan, so the sp_tableoption ..., 'large value types out of row', '1' should be used. Note that this does not affect existing rows, in my test not even on index rebuild, so the option has to be turned on early.
For most OLTP type loads though the fact that MAX types are stored inline if possible is actually an advantage, since the OLTP access pattern is to seek and the row width makes little impact on it.
None the less, regarding the original question: separate table is not necessary. Turning on the large value types out of row option achieves the same result at a free cost for development/test.
Paradoxically, if your data is normally less than 8000 characters, I would store it in a separate table, while if the data is greater than 8000 characters, I would keep it in the same table.
This is because what happens is that SQL Server keeps the data in the page if it allows the row to sit in single page, but when the data gets larger, it moves it out just like the TEXT data type and leaves just a pointer in the row. So for a bunch of 3000 character rows, you are fitting less rows per page, which is really inefficient, but for a bunch of 12000 character rows, the data is out of the row, so it's actually more efficient.
Having said this, typically you have a wide ranging mix of lengths and thus I would move it into its own table. This gives you flexibility for moving this table to a different file group etc.
Note that you can also specify it to force the data out of the row using the sp_tableoption. varchar(max) is basically similar to the TEXT data type with it defaulting to data in row (for varchar(max)) instead of defaulting to data out of row (for TEXT).
You should structure your data into whatever seems the most logical structure and allow SQL Server to perform its optimizations as to how to physically store the data.
If you find, through performance analysis, that your structure is a performance problem, then consider performing changes to your structure or to storage settings.
Keep it inline. The whole point of varchar is that it takes up 0 bytes if it's empty, 4 bytes for 'Hello', and so on.
I would normalize it by creating the Detail table. I assume some of the entries in Log will have the same Detail? So if you normalize it you will only be storing an FK id INTEGER instead of the text for every occurrence if you stored the text on the Detail table. If you have reasons to de-normalize do it, but from your question I don't see that being the case.
Having a nullable column costs 2 bytes for every 16 of them. If this is the only (or 17th, or 33nd, etc) nullable column in the table, it will cost you 2 bytes per row, otherwise nothing.

Resources