SQL Server: Record size larger than expected - sql-server

My table consists of 3 columns
| Column Name | Data Type | Size
| Value | real | 4
| LogId | int | 4
| SigId | smallint | 2
One primary key is set for columns LogId, SigId.
The sum of all size's is 4+4+2=10, however using sys.dm_db_index_physical_statsI get, that the average (and min/max) record size in bytes is 25. Can someone explain? Am I comparing apples and oranges?

The physical record length includes row overhead in addition to the space needed for the actual column values. On my SQL Server instance, I get an average record length of 17 reported with the following table:
CREATE TABLE dbo.Example1(
Value real NOT NULL
, LogId int NOT NULL
, SigId smallint NOT NULL
, CONSTRAINT PK_Example1 PRIMARY KEY CLUSTERED(LogId, SigId)
);
GO
INSERT INTO dbo.Example1 (Value, LogId, SigId) VALUES(1, 2, 3);
GO
SELECT avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Example1'),1,0,'DETAILED')
WHERE index_level = 0;
GO
The 17 byte record length reported by sys.dm_db_index_physical_stats includes 10 bytes for data, 4 bytes for the record header, 2 bytes for the column count, and 1 byte for the NULL bitmap. See Paul Randal's Anatomy of a record article for details of the record structure.
Below is a script to dump the first clustered index data page using DBCC_PAGE as determined by the undocumented (don't use it in production) sys.dm_db_database_page_allocations table-valued function:
DECLARE
#database_id int = DB_ID()
, #object_id int = OBJECT_ID(N'dbo.Example1')
, #allocated_page_file_id int
, #allocated_page_page_id int;
--get first clustered index data page
SELECT
#allocated_page_file_id = allocated_page_file_id
, #allocated_page_page_id = allocated_page_page_id
FROM sys.dm_db_database_page_allocations(#database_id, #object_id, 1, 1, 'DETAILED')
WHERE
page_type_desc = N'DATA_PAGE'
AND previous_page_page_id IS NULL --first page of clustered index;
--dump record
DBCC TRACEON(3604);
DBCC PAGE(#database_id,#allocated_page_file_id,#allocated_page_page_id,1);
DBCC TRACEOFF(3604);
GO
Here is an excerpt from the results on my instance with the physical record structure fields called out:
DATA:
Slot 0, Offset 0x60, Length 17, DumpStyle BYTE
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Record Size = 17
Memory Dump #0x0000002262C7A060
0000000000000000: 10000e00 02000000 03000000 803f0300 00 .............?...
| | | | | |null bitmap (1 byte)
| | | | |column count (2 bytes)
| | | |Value column data (4-byte real)
| | |SigId column data (2-byte smallint)
| |LogId column data (4-byte int)
|Record header (2-byte record type and 2 byte offset to null bitmap)
As to why your actual record length is 25 instead of 17 as in this example, the likely cause is schema changes were made after the table was initially created as Martin suggested in his comment. If the database has a row-versioning isolation level enabled, there will be additional overhead as mentioned in Paul's blog post but I doubt that is the reason here since that overhead would be more than 8 bytes.

Related

SQL Server Merge Update With Partial Sources

I have a target table for which partial data arrives at different times from 2 departments. The keys they use are the same, but the fields they provide are different. Most of the rows they provide have common keys, but there are some rows that are unique to each department. My question is about the fields, not the rows:
Scenario
the target table has a key and 30 fields.
Dept. 1 provides fields 1-20
Dept. 2 provides fields 21-30
Suppose I loaded Q1 data from Dept. 1, and that created new rows 100-199 and populated fields 1-20. Later, I receive Q1 data from Dept. 2. Can I execute the same merge code I previously used for Dept. 1 to update rows 100-199 and populate fields 21-30 without unintentionally changing fields 1-20? Alternatively, would I have to tailor separate merge code for each Dept.?
In other words, does (or can) "Merge / Update" operate only on target fields that are present in the source table while ignoring target fields that are NOT present in the source table? In this way, Dept. 1 fields would NOT be modified when merging Dept. 2, or vice-versa, in the event I get subsequent corrections to this data from either Dept.
You can use a merge instruction, where you define a source and a target data, and what happens when a registry is found on both, just on the source, just on the target, and even expand it with custom logic, like it's just on the source, and it's older than X, or it's from department Y.
-- I'm skipping the fields 2-20 and 22-30, just to make this shorter.
create table #target (
id int primary key,
field1 varchar(100), -- and so on until 20
field21 varchar(100), -- and so on until 30
)
create table #dept1 (
id int primary key,
field1 varchar(100)
)
create table #dept2 (
id int primary key,
field21 varchar(100)
)
/*
Creates some data to merge into the target.
The expected result is:
| id | field1 | field21 |
| - | - | - |
| 1 | dept1: 1 | dept2: 1 |
| 2 | | dept2: 2 |
| 3 | dept1: 3 | |
| 4 | dept1: 4 | dept2: 4 |
| 5 | | dept2: 5 |
*/
insert into #dept1 values
(1,'dept1: 1'),
--(2,'dept1: 2'),
(3,'dept1: 3'),
(4,'dept1: 4')
insert into #dept2 values
(1,'dept2: 1'),
(2,'dept2: 2'),
--(3,'dept2: 3'),
(4,'dept2: 4'),
(5,'dept2: 5')
-- Inserts the data from the first department. This could be also a merge, it necessary.
insert into #target(id, field1)
select id, field1 from #dept1
merge into #target t
using (select id, field21 from #dept2) as source_data(id, field21)
on (source_data.id = t.id)
when matched then update set field21=source_data.field21
when not matched by source and t.field21 is not null then delete -- you can even use merge to remove some records that match your criteria
when not matched by target then insert (id, field21) values (source_data.id, source_data.field21); -- Every merge statement should end with ;
select * from #target
You can see this code running on this DB Fiddle

Update a Column in All Rows with Different Values from an expression based on a parameter from another Column in the Current Row

Please, I had recently changed my DB from MS Access To SQL server Express, Access is a wonderful small scale DB for a SINGLE user that have a very simple VBA Functionality which I missed in SQL server!
In My Old Access DB I have [Account] table with a Sub Procedure that Update a field in All Rows in a table with the result of this Expression:
[SortOrder] = [AccountNumber] * (10 ^ (Len(MaximumAccountNumber) - Len([AccountNumber])))
where MaximumAccountNumber is a Variable represent the Max AccountNumber in the table.
I was searching for a solution for many days but no one example can give me an idea for how to use a Value from a column in the SAME Row to Calculate the result for another Column in that Row and so on for All the Rows in the table as if in the Following VBA code:
Do while Not rst.EOF
rst.Edit
rst![Field1] = rst![Field2] * ( 10 ^ ( (Len(MaximumAccountNumber) - Len(rst![Field2]) ) )
rst.Update
rst.MoveNext
Loop
Please How to implement such an Update efficiently In SQL server T-SQL without using a Cursor because the Rows Count in the table could reaches to > 100,000?
Please, I want to do This by Creating a SP which I Can Fire it (Trigger) after every Insert of a New Account to Re-Calculate the SortOrder of All the Rows in the table as in the Following:
CREATE PROCEDURE [dbo].[SortingOrder]
#MaxOrder Numeric(38,0) = 0,
#Digits int = 0,
As
BEGIN
set #MaxOrder = (select MAX([AccNumber]) from Account)
set #Digits = (select LEN(#MaxOrder))
Update dbo.Account
Set [SortOrder] = (Select ([AccNumber] * (POWER(10 ,(#Digits -
LEN([AccNumber]))) from [Account] )
END
GO
As in This Sample Table [Account]:
AccID AccNumber SortOrder
----- --------- ---------
023 23 2300
054 243 2430
153 5434 5434
But when Insert a new Record, I want the SortOrder to be Updated for All the rows to a Number with the same Numbers Count based on 10 Power(Length of the Max AccNumber) as in the Following:
AccID AccNumber SortOrder
----- --------- ---------
023 23 230000000
054 243 243553000
153 5434 543400000
233 432345625 432345625
Try this:
Table Schema:
CREATE TABLE Account(AccID INT,AccNumber BIGINT,SortOrder BIGINT)
INSERT INTO Account VALUES(23,23,23)
INSERT INTO Account VALUES(54,254,254)
INSERT INTO Account VALUES(125,25487,25487)
T-SQL Query:
DECLARE #MaxValLen INT
SELECT #MaxValLen = LEN(MAX(AccNumber)) FROM Account
UPDATE Account
SET SortOrder = AccNumber * POWER(10,#MaxValLen - LEN(AccNumber))
Output:
| AccID | AccNumber | SortOrder |
|-------|-----------|-----------|
| 23 | 23 | 23000 |
| 54 | 254 | 25400 |
| 125 | 25487 | 25487 |

Break up the data in a database column of a record into multiple records

Azure SQL Server - we have a table like this:
MyTable:
ID Source ArticleText
-- ------ -----------
1 100 <nvarchar(max) field with unstructured text from media articles>
2 145 "
3 866 "
4 232 "
ID column is the primary key and auto-increments on INSERTS.
I run this query to find the records with the largest data size in the ArticleText column:
SELECT TOP 500
ID, Source, DATALENGTH(ArticleText)/1048576 AS Size_in_MB
FROM
MyTable
ORDER BY
DATALENGTH(ArticleText) DESC
We are finding that for many reasons both technical and practical, the data in the ArticleText column is just too big in certain records. The above query allows me to look at a range of sizes for our largest records, which I'll need to know for what I'm trying to formulate here.
The feat I need to accomplish is, for all existing records in this table, any record whose ArticleText DATALENGTH is greater than X, break that record into X amount of records where each record will then contain the same value in the Source column, but have the data in the ArticleText column split up across those records in smaller chunks.
How would one achieve this if the exact requirement was say, take all records whose ArticleText DATALENGTH is greater than 10MB, and break each into 3 records where the resulting records' Source column value is the same across the 3 records, but the ArticleText data is separated into three chunks.
In essence, we would need to divide the DATALENGTH by 3 and apply the first 1/3 of the text data to the first record, 2nd 1/3 to the 2nd record, and the 3rd 1/3 to the third record.
Is this even possible in SQL Server?
You can use the following code to create a side table with the needed data:
CREATE TABLE #mockup (ID INT IDENTITY, [Source] INT, ArticleText NVARCHAR(MAX));
INSERT INTO #mockup([Source],ArticleText) VALUES
(100,'This is a very long text with many many words and it is still longer and longer and longer, and even longer and longer and longer')
,(200,'A short text')
,(300,'A medium text, just long enough to need a second part');
DECLARE #partSize INT=50;
WITH recCTE AS
(
SELECT ID,[Source]
,1 AS FragmentIndex
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(ArticleText,A.Pos) ELSE ArticleText END AS Fragment
,CASE WHEN A.Pos>0 THEN SUBSTRING(ArticleText,A.Pos+2,DATALENGTH(ArticleText)/2) END AS RestString
FROM #mockup
CROSS APPLY(SELECT CASE WHEN DATALENGTH(ArticleText)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(ArticleText,#partSize)))
ELSE -1 END AS Pos) A
UNION ALL
SELECT r.ID,r.[Source]
,r.FragmentIndex+1
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(r.RestString,A.Pos) ELSE r.RestString END
,CASE WHEN A.Pos>0 THEN SUBSTRING(r.RestString,A.Pos+2,DATALENGTH(r.RestString)/2) END AS RestString
FROM recCTE r
CROSS APPLY(SELECT CASE WHEN DATALENGTH(r.RestString)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(r.RestString,#partSize)))
ELSE -1 END AS Pos) A
WHERE DATALENGTH(r.RestString)>0
)
SELECT ID,[Source],FragmentIndex,Fragment
FROM recCTE
ORDER BY [Source],FragmentIndex;
GO
DROP TABLE #mockup
The result
+----+--------+---------------+---------------------------------------------------+
| ID | Source | FragmentIndex | Fragment |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 1 | This is a very long text with many many words and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 2 | it is still longer and longer and longer, and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 3 | even longer and longer and longer |
+----+--------+---------------+---------------------------------------------------+
| 2 | 200 | 1 | A short text |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 1 | A medium text, just long enough to need a second |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 2 | part |
+----+--------+---------------+---------------------------------------------------+
Now you have to update the existing line with the value at FragmentIndex=1, while you have to insert the values of FragmentIndex>1. Do this sorted by FragmentIndex and your IDENTITY ID-column will reflect the correct order.

SQL IF/ CASE statement

I am fairly new to SQL and I can't figure out what to do here.
I have a database of financial data with one column being [Days]. I need to add a new column into it which will add a category in which the number of days fall into (0, 1-30, 30-60 etc).
In excel this would look like this =IF(A1>90,"90-120",IF(A1>60,"60-90".......)
The final database should look like this:
Days | Category
29 | 0-30
91 | 90-120
0 | 0
.
.
.
Thx in advance
You can use case:
select days,
(case when days > 90 then '90-120' -- should this be >= ?
when days > 60 then '60-90' -- should this be >= ?
. . .
end) as Category
from t;
Complete SQL:
select Days,
(case when days > '90' then '91-120'
when days > '60' then '61-90'
when days > '30' then '31-60'
when days > '0' then '1-30' else '0' end
end) as Category
from t;
Here is another way using IIF if you are using SQL Server 2012+:
CREATE TABLE Numbers (Number INT );
INSERT INTO Numbers VALUES
(1),
(0),
(15),
(29),
(32),
(54),
(59),
(60),
(63),
(89),
(90),
(140);
SELECT IIF(Number BETWEEN 90 AND 120, '90-120',
IIF(Number BETWEEN 60 AND 89, '60-90',
IIF(Number BETWEEN 30 AND 59 , '30-60' ,
IIF(Number BETWEEN 1 AND 29, '1-30' ,
IIF(Number = 0, '0', 'OutRange'))))) AS Category
FROM Numbers;
try this
create table #tmp ([Days] int)
insert into #tmp values (29)
insert into #tmp values (91)
insert into #tmp values (0)
insert into #tmp values (65)
SELECT
CASE WHEN [Days]=0 then CONVERT(VARCHAR(15),0)
ELSE CONVERT(VARCHAR(15),[Days]/30*30)+'-'+ CONVERT(VARCHAR(15),([Days]/30*30)+30) END AS Category
from #tmp
drop table #tmp
select *,number/30, ltrim(number/30*30)+'-' +ltrim((number/30+1)*30) from #Numbers
+--------+---+-------+
| Number | | |
+--------+---+-------+
| 1 | 0 | 0-30 |
| 0 | 0 | 0-30 |
| 15 | 0 | 0-30 |
| 29 | 0 | 0-30 |
| 32 | 1 | 30-60 |
| 54 | 1 | 30-60 |
| 59 | 1 | 30-60 |
| 60 | 2 | 60-90 |
| 63 | 2 | 60-90 |
+--------+---+-------+
One solution to your dilemma may be to insert a new database column that uses a SQL Server feature known as a "Computed Column Specification" into your table.
A Computed Column Specification is a method whereby a database column's value can be calculated when the row is updated. That value can be optionally also be persisted in the database so that when it is queried no calculation has to be performed at that time (just on the INSERT).
I like this solution because you don't have to do any special calculations upon querying the data. You'll pull the new column data with a simple SELECT.
You didn't list specifics, so let's suppose that your database table is named [FinancialData], and that it has defined in it a column named [Days] that is of some numeric type (int, smallint, tinyint, decimal, float, money, numeric, or real).
You can add the computed column as follows:
ALTER TABLE [FinancialData] ADD
Category AS (CASE WHEN [Days] >= 90 THEN '90-120'
WHEN [Days] >= 60 THEN '60-90'
WHEN [Days] >= 30 THEN '30-60'
WHEN [Days] >= 1 THEN '1-30'
WHEN [Days] = 0 THEN '0'
END) PERSISTED;
Note the word "PERSISTED" in the SQL statement above. This is what causes the database table to actually store the calculated value in the database table when the [Days] column is inserted or changed. If you don't want to store the value, simply leave out the word "PERSISTED".
When the computed column is added to the table by executing the SQL statement above, values will be computed and stored for all existing rows in the table. When inserting a new row into the table, do not supply a value for the new [Category] column. This is because a) it won't work, and b) that column's value will be computed from the [Days] column value.
To retrieve data from the new column, you simply list that column in the SELECT statement (or use *):
SELECT [Days], [Category]
FROM [FinancialData];
A couple of caveats to note: 1) This is SQL Server specific. Most other database engines have no support for this feature. 2) You didn't state whether the [Days] column is nullable - if so, this solution will have to be modified to support that.

SQL Server : Bulk insert a Datatable into 2 tables

Consider this datatable :
word wordCount documentId
---------- ------- ---------------
Ball 10 1
School 11 1
Car 4 1
Machine 3 1
House 1 2
Tree 5 2
Ball 4 2
I want to insert these data into two tables with this structure :
Table WordDictionary
(
Id int,
Word nvarchar(50),
DocumentId int
)
Table WordDetails
(
Id int,
WordId int,
WordCount int
)
FOREIGN KEY (WordId) REFERENCES WordDictionary(Id)
But because I have thousands of records in initial table, I have to do this just in one transaction (batch query) for example using bulk insert can help me doing this purpose.
But the question here is how I can separate this data into these two tables WordDictionary and WordDetails.
For more details :
Final result must be like this :
Table WordDictionary:
Id word
---------- -------
1 Ball
2 School
3 Car
4 Machine
5 House
6 Tree
and table WordDetails :
Id wordId WordCount DocumentId
---------- ------- ----------- ------------
1 1 10 1
2 2 11 1
3 3 4 1
4 4 3 1
5 5 1 2
6 6 5 2
7 1 4 2
Notice :
The words in the source can be duplicated so I must check word existence in table WordDictionary before any insert record in these tables and if a word is found in table WordDictionary, the just found Word ID must be inserted into table WordDetails (please see Word Ball)
Finally the 1 M$ problem is: this insertion must be done as fast as possible.
If you're looking to just load the table the first time without any updates to the table over time you could potentially do it this way (I'm assuming you've already created the tables you're loading into):
You can put all of the distinct words from the datatable into the WordDictionary table first:
SELECT DISTINCT word
INTO WordDictionary
FROM datatable;
Then after you populate your WordDictionary you can then use the ID values from it and the rest of the information from datatable to load your WordDetails table:
SELECT WD.Id as wordId, DT.wordCount as WordCount, DT.documentId AS DocumentId
INTO WordDetails
FROM datatable as DT
INNER JOIN WordDictionary AS WD ON WD.word = DT.word
There a little discrepancy between declared table schema and your example data, but it was solved:
1) Setup
-- this the table with the initial data
-- drop table DocumentWordData
create table DocumentWordData
(
Word NVARCHAR(50),
WordCount INT,
DocumentId INT
)
GO
-- these are result table with extra information (identity, primary key constraints, working foreign key definition)
-- drop table WordDictionary
create table WordDictionary
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDictionary PRIMARY KEY,
Word nvarchar(50)
)
GO
-- drop table WordDetails
create table WordDetails
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDetails PRIMARY KEY,
WordId int CONSTRAINT FK_WordDetails_Word REFERENCES WordDictionary,
WordCount int,
DocumentId int
)
GO
2) The actual script to put data in the last two tables
begin tran
-- this is to make sure that if anything in this block fails, then everything is automatically rolled back
set xact_abort on
-- the dictionary is obtained by considering all distinct words
insert into WordDictionary (Word)
select distinct Word
from DocumentWordData
-- details are generating from initial data joining the word dictionary to get word id
insert into WordDetails (WordId, WordCount, DocumentId)
SELECT W.Id, DWD.WordCount, DWD.DocumentId
FROM DocumentWordData DWD
JOIN WordDictionary W ON W.Word = DWD.Word
commit
-- just to test the results
select * from WordDictionary
select * from WordDetails
I expect this script to run very fast, if you do not have a very large number of records (millions at most).
This is the query. I'm using temp table to be able to test.
if you use the 2 CTEs, you'll be able to generate the final result
1.Setting up a sample data for test.
create table #original (word varchar(10), wordCount int, documentId int)
insert into #original values
('Ball', 10, 1),
('School', 11, 1),
('Car', 4, 1),
('Machine', 3, 1),
('House', 1, 2),
('Tree', 5, 2),
('Ball', 4, 2)
2. Use cte1 and cte2. In your real database, you need to replace #original with the actual table name you have all initial records.
;with cte1 as (
select ROW_NUMBER() over (order by word) Id, word
from #original
group by word
)
select * into #WordDictionary
from cte1
;with cte2 as (
select ROW_NUMBER() over (order by #original.word) Id, Id as wordId,
#original.word, #original.wordCount, #original.documentId
from #WordDictionary
inner join #original on #original.word = #WordDictionary.word
)
select * into #WordDetails
from cte2
select * from #WordDetails
This will be data in #WordDetails
+----+--------+---------+-----------+------------+
| Id | wordId | word | wordCount | documentId |
+----+--------+---------+-----------+------------+
| 1 | 1 | Ball | 10 | 1 |
| 2 | 1 | Ball | 4 | 2 |
| 3 | 2 | Car | 4 | 1 |
| 4 | 3 | House | 1 | 2 |
| 5 | 4 | Machine | 3 | 1 |
| 6 | 5 | School | 11 | 1 |
| 7 | 6 | Tree | 5 | 2 |
+----+--------+---------+-----------+------------+

Resources