I have a database in production and need to reduce the on disk size of the database. I followed the instructions to shrink the file but the results was a bit surprising.
Sorry for the lots of numbers here but I do not know how to express the problem any better.
Database containing only one table with 11,634,966 rows.
The table structure as follow I just changed the column names
id bigint not null 8 -- the primary key (clustered index)
F1 int not null 4
F2 int not null 4
F3 int not null 4
F4 datetime not null 8
F5 int not nul 4
F6 int not null 4
F7 int not null 4
F8 xml ?
F9 uniqueidentifier 16
F10 int 4
F11 datetime not null 8
Excluding the XML field I calculate that the data size will be 68 bytes per row.
I ran a query against the database finding the min, max and avg size of the xml field F8
Showing the following:
min : 625 bytes
max : 81782 bytes
avg : 5321 bytes
The on disk file is 108G big after shrinking the database.
This translate to the following
108G / 11.6M records = 9283 bytes per row
- 5321 bytes per row (Avg of XML)
= 3962 bytes per row
- 68 (data size of other fields in row)
= 3894 bytes per row. (must be overhead)
but this mean that the overhead is 41.948%
Is this to be expected? and is there anything that I can do to reduce the 108G disk size.
BTW there is only one clustered index on the table.
And I am using SQL Server 2008 (SP3)
Related
With an ~ 18 years old application users file "cases" and each case creates a row in a "journal" table a data base (on SQL 2000). This cases can be tagged with "descriptors" where a somewhere hard coded limit of 50 is set. The descriptors/tags are stored in a lookup table and the key for the descriptors is a number from the power of two sequence (2^n).
This table looks like this:
key
descriptor
1
D 1
2
D 2
4
D 3
8
D 4
16
D 5
There are 50 rows, which means the biggest key is 562.949.953.421.312. Each case can have up to 8 descriptors, which are unfortunately stored in a single column in the case journal table. They keys are stored as a summary of all descriptors on that case.
A case with the descriptor D2 has 2 in the journal
A case with the descriptors D2 and D4 has 10
A case with the descriptors D1, D3 and D5 has 21
The Journal has 100 million records. Now the first time since years there is the requirement to analyze the journal by descriptors.
What would be a smart (mathematical) way to query the journal and get the results for one descriptor?
Edit:
in answer to the comment of #Squirrel:
jkey
jvalue
descriptors
1
V 1
0
2
V 2
24
3
V 3
3
4
V 4
12
5
V 5
6
You need to use bitwise operators.
Assuming the column is bigint then
where yourcolumn & 16 > 0
will find the ones matching D5 for example
If you are trying this query for literal values larger than fit into a signed 32 bit int make sure you cast them to BIGINT by the way as they will be interpreted as numeric datatype by default which cannot be used with bitwise operators.
WHERE yourcolumn & CAST(562949953421312 AS BIGINT) > 0
You may also similarly need to cast yourcolumn if it is in fact numeric rather than bigint
According to the SQL Server documentation for sparse columns:
Consider using sparse columns when the space saved is at least 20 percent to 40 percent.
And there are tables that show the percentage of data that must be NULL to achieve a "net space savings of 40%" based on the size of the data type. Here's a summary:
Nonsparse bytes Sparse bytes NULL percentage
--------------- ------------ ---------------
1 5 86%
2 6 76%
3 7 69%
4 8 64%
5 9 60%
6 10 57%
8 (bigint) 12 52%
10 14 49%
Questions:
How were the above NULL percentages calculated?
I'm having trouble figuring out the math. For example, consider the row for bigint. If 52% of values in a sparse bigint column are NULL, then the 48% of non-NULL values should require 12 bytes × 48% = 5.76 bytes on average, which is 8 − 5.76 = 2.24 bytes less compared to a nonsparse column. But this is a savings of 2.24 bytes ÷ 8 bytes = 28%, not 40%.
What percentage of data must be NULL to achieve a space savings of 20%?
The post Why & When should I use SPARSE COLUMN? doesn't answer either question.
I have to calculate storage requirements in terms of blocks for following table:
Bill(billno number(7),billdate date,ccode varchar2(20),amount number(9,2))
The table storage attributes are :
PCTFREE=20 , INITRANS=4 , PCTUSED=60 , BLOCKSIZE=8K , NUMBER OF ROWS=100000
I searched a lot on internet, referred many books but didn't got anything.
First you need to figure out what is the typical value for varchar2 column. The total size will depend on that. I created 2 tables from your BILL table. BILLMAX where ccode takes always 20 Char ('12345678901234567890') and BILLMIN that has always NULL in ccode.
The results are:
TABLE_NAME NUM_ROWS AVG_ROW_LEN BLOCKS
BILLMAX 3938 37 28
BILLMIN 3938 16 13
select table_name, num_rows, avg_row_len, blocks from user_tables
where table_name in ( 'BILLMIN', 'BILLMAX')
As you can see, the number of blocks depends on that. Use exec dbms_stats.GATHER_TABLE_STATS('YourSchema','BILL') to refresh values inside user_tables.
The other thing that you need to take into consideration is how big will be your extents. For example :
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
will generate first 16 extents with 8 blocks size. After that it will start to create extents with size of 1 MB (128 blocks).
So for BILLMAX it will generate 768 blocks and BILLMIN will take 384 blocks.
As you can see the difference is quite big.
For BILLMAX : 16 * 8 + 128 * 5 = 768
For BILLMIN : 16 * 8 + 128 * 2 = 384
In docs for SQL Server is said:
... The storage size, in bytes, is two times the actual length of data entered + 2 bytes.
What means these 2 additional bytes?
Because for field nvarchar( 160 ) sys.columns.max_length returns 320. There is no extra 2 bytes.
I'm trying to compare tablespaces sizes between 2 databases. I already extracted the needed field to compare as above:
STAT-TBS-DB-SOURCE.lst: (column 1 : TBS Name, column 2 : real size)
TBS001 12
TBS002 50
TBS003 20
TBS004 45
STAT-TBS-DBTARGET.lst (column1:TBS Name, column 2 :max size)
TBS001 10
TBS002 50
TBS003 20
TBS004 40
I need to compare the second columns (c1,c2) of the 2 files (f1,f2), if f2.c2<f1.c2 then print increase Tablespace f1.c1 by ( f1.c2 - f2.c2) MB.
What solution have you for me?
I tried with awk but I cannot get the value of the f1.c2.
Thanks
kent$ awk 'NR==FNR{a[$1]=$2;next}$1 in a && $2<a[$1]{
printf "increase Tablespace %s by %d MB\n",$1,(a[$1]-$2)}' f f2
increase Tablespace TBS001 by 2 MB
increase Tablespace TBS004 by 5 MB