According to the SQL Server documentation for sparse columns:
Consider using sparse columns when the space saved is at least 20 percent to 40 percent.
And there are tables that show the percentage of data that must be NULL to achieve a "net space savings of 40%" based on the size of the data type. Here's a summary:
Nonsparse bytes Sparse bytes NULL percentage
--------------- ------------ ---------------
1 5 86%
2 6 76%
3 7 69%
4 8 64%
5 9 60%
6 10 57%
8 (bigint) 12 52%
10 14 49%
Questions:
How were the above NULL percentages calculated?
I'm having trouble figuring out the math. For example, consider the row for bigint. If 52% of values in a sparse bigint column are NULL, then the 48% of non-NULL values should require 12 bytes × 48% = 5.76 bytes on average, which is 8 − 5.76 = 2.24 bytes less compared to a nonsparse column. But this is a savings of 2.24 bytes ÷ 8 bytes = 28%, not 40%.
What percentage of data must be NULL to achieve a space savings of 20%?
The post Why & When should I use SPARSE COLUMN? doesn't answer either question.
As is evident by the MSDN description of decimal certain precision ranges have the same amount of storage bytes assigned to them.
What I don't understand is that there are differences in the sizes of the range. How the range from 1 to 9 of 5 storage bytes has a width of 9, while the range from 10 to 19 of 9 storage bytes has a width of 10. Then the next range of 13 storage bytes has a width of 9 again, while the next has a width of 10 again.
Since the storage bytes increase by 4 every time, I would have expected all of the ranges to be the same width. Or maybe the first one to be smaller to reserve space for the sign or something but from then on equal in width. But it goes from 9 to 10 to 9 to 10 again.
What's going on here? And if it would exist, would 21 storage bytes have a precision range of 39-47 i.e. is the pattern 9-10-9-10-9-10...?
would 21 storage bytes have a precision range of 39-47
No. 2 ^ 160 = 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 - which has 49 decimal digits. So this hypothetical scenario would cater for a precision range of 39-48 (as a 20 byte integer would not be big enough to hold any 49 digit numbers larger than that)
The first byte is reserved for the sign.
01 is used for positive numbers; 00 for negative.
The remainder stores the value as an integer. i.e. 1.234 would be stored as the integer 1234 (or some multiple of 10 of this dependant on the declared scale)
The length of the integer is either 4, 8, 12 or 16bytes depending on the declared precision. Some 10 digit integers can be stored in 4 bytes however to get the whole range in would overflow this so it needs to go to the next step up.
And so on.
2^32 = 4,294,967,295 (10 digits)
2^64 = 18,446,744,073,709,551,616 (20 digits)
2^96 = 79,228,162,514,264,337,593,543,950,336 (29 digits)
2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 (39 digits)
You need to use DBCC PAGE to see this, casting the column as binary does not give you the storage representation. Or use a utility like SQL Server internals viewer.
CREATE TABLE T(
A DECIMAL( 9,0),
B DECIMAL(19,0),
C DECIMAL(28,0) ,
D DECIMAL(38,0)
);
INSERT INTO T VALUES
(999999999, 9999999999999999999, 9999999999999999999999999999, 99999999999999999999999999999999999999),
(-999999999, -9999999999999999999, -9999999999999999999999999999, -99999999999999999999999999999999999999);
Shows the first row stored as
And the second as
Note that the values after the sign bit are byte reversed. 0x3B9AC9FF = 999999999
I have a database in production and need to reduce the on disk size of the database. I followed the instructions to shrink the file but the results was a bit surprising.
Sorry for the lots of numbers here but I do not know how to express the problem any better.
Database containing only one table with 11,634,966 rows.
The table structure as follow I just changed the column names
id bigint not null 8 -- the primary key (clustered index)
F1 int not null 4
F2 int not null 4
F3 int not null 4
F4 datetime not null 8
F5 int not nul 4
F6 int not null 4
F7 int not null 4
F8 xml ?
F9 uniqueidentifier 16
F10 int 4
F11 datetime not null 8
Excluding the XML field I calculate that the data size will be 68 bytes per row.
I ran a query against the database finding the min, max and avg size of the xml field F8
Showing the following:
min : 625 bytes
max : 81782 bytes
avg : 5321 bytes
The on disk file is 108G big after shrinking the database.
This translate to the following
108G / 11.6M records = 9283 bytes per row
- 5321 bytes per row (Avg of XML)
= 3962 bytes per row
- 68 (data size of other fields in row)
= 3894 bytes per row. (must be overhead)
but this mean that the overhead is 41.948%
Is this to be expected? and is there anything that I can do to reduce the 108G disk size.
BTW there is only one clustered index on the table.
And I am using SQL Server 2008 (SP3)
I have to calculate storage requirements in terms of blocks for following table:
Bill(billno number(7),billdate date,ccode varchar2(20),amount number(9,2))
The table storage attributes are :
PCTFREE=20 , INITRANS=4 , PCTUSED=60 , BLOCKSIZE=8K , NUMBER OF ROWS=100000
I searched a lot on internet, referred many books but didn't got anything.
First you need to figure out what is the typical value for varchar2 column. The total size will depend on that. I created 2 tables from your BILL table. BILLMAX where ccode takes always 20 Char ('12345678901234567890') and BILLMIN that has always NULL in ccode.
The results are:
TABLE_NAME NUM_ROWS AVG_ROW_LEN BLOCKS
BILLMAX 3938 37 28
BILLMIN 3938 16 13
select table_name, num_rows, avg_row_len, blocks from user_tables
where table_name in ( 'BILLMIN', 'BILLMAX')
As you can see, the number of blocks depends on that. Use exec dbms_stats.GATHER_TABLE_STATS('YourSchema','BILL') to refresh values inside user_tables.
The other thing that you need to take into consideration is how big will be your extents. For example :
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
will generate first 16 extents with 8 blocks size. After that it will start to create extents with size of 1 MB (128 blocks).
So for BILLMAX it will generate 768 blocks and BILLMIN will take 384 blocks.
As you can see the difference is quite big.
For BILLMAX : 16 * 8 + 128 * 5 = 768
For BILLMIN : 16 * 8 + 128 * 2 = 384
Is it possible to compute the size of data if I know its size when it's base64 encoded?
I've a file that is 450KB in size when base64 encoded but what size is it decompressed?
Is there a method to find output size without decompressing the file first?
I've a file that is 450KB in size when base64 encoded but what size is it decompressed?
In fact, you don't "decompress", you decode. The result will be smaller than the encoded data.
As Base 64 encoding needs ~ 8 bits for each 6 bits of the original data (or 4 bytes to store 3), the math is simple:
Encoded Decoded
450KB / 4 * 3 = ~ 337KB
The overhead between Base64 and decoded string is nearly constant, 33.33%. I say "nearly" just because the padding bytes at the end (=) that make the string length multiple of 4. See some examples:
String Encoded Len B64 Pad Space needed
A QQ== 1 2 2 400.00%
AB QUI= 2 3 1 200.00%
ABC QUJD 3 4 0 133.33%
ABCD QUJDRA== 4 6 2 200.00%
ABCDEFGHIJKLMNOPQ QUJDREVGR0hJSktMTU5PUFE= 17 23 1 140.00%
( 300 bytes ) ( 400 bytes ) 300 400 0 133.33%
( 500 bytes ) ( 668 bytes ) 500 666 2 133.60%
( 5000 bytes ) ( 6668 bytes ) 5000 6666 2 133.36%
... tends to 133.33% ...
Calculating the space for unencoded data:
Let's get the value QUJDREVGR0hJSktMTU5PUFE= mentioned above.
There are 24 bytes in the encoded value.
Let's calculate 24 / 4 * 3 => the result is 18.
Let's count the number of =s on the end of encoded value: In this case, 1
(we need to check only the 2 last bytes of encoded data).
Getting 18 (obtained on step 2) - 1 (obtained on step 3 ) we get 17
So, we need 17 bytes to store the data.
base64 adds roughly a third to the original size, so your file should be more or less .75*450kb in size.