Related
I have a stored procedure which executes a query and return the line into variables like below:
SELECT #item_id = I.ID, #label_ID = SL.label_id,
FROM tb_A I
LEFT JOIN tb_B SL ON I.ID = SL.item_id
WHERE I.NUMBER = #VAR
I have a IF to check if #label_ID is null or not. If it is null, it goes to INSERT statement, otherwise it goes to UPDATE statement. Let's focus on INSERT where I know I'm having problems. The INSERT part is like below:
IF #label_ID IS NULL
BEGIN
INSERT INTO tb_B (item_id, label_qrcode, label_barcode, data_leitura, data_inclusao)
VALUES (#item_id, #label_qrcode, #label_barcode, #data_leitura, GETDATE())
END
So, tb_B has a PK in ID column and a FK in item_ID column which refers to column ID in tb_A table.
I ran SQL Server Profiler and I saw that sometimes the duration for this stored procedure takes around 2300ms and the normal average for this is 16ms.
I ran the "Execution Plan" and the biggest cost is in the "Clustered Index Insert" component. Showing below:
Estimated Execution Plan
Actual Execution Plan
Details
More details about the tables:
tb_A Storage:
Index space: 6.853,188 MB
Row count: 45988842
Data space: 5.444,297 MB
tb_B Storage:
Index space: 1.681,688 MB
Row count: 15552847
Data space: 1.663,281 MB
Statistics for INDEX 'PK_tb_B'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PK_tb_B Sep 23 2018 2:30AM 15369616 15369616 5 1 4 NO 15369616
All Density Average Length Columns
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6.506343E-08 4 id
Histogram Steps
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 0 1 0 1
8192841 8192198 1 8192198 1
8270245 65535 1 65535 1
15383143 7111878 1 7111878 1
15383144 0 1 0 1
Statistics for INDEX 'IDX_tb_B_ITEM_ID'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IDX_tb_B_ITEM_ID Sep 23 2018 2:30AM 15369616 15369616 12 1 7.999424 NO 15369616
All Density Average Length Columns
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6.50728E-08 3.999424 item_id
6.506343E-08 7.999424 item_id, id
Histogram Steps
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 2214 0 1
16549857 0 1 0 1
29907650 65734 1 65734 1
32097131 131071 1 131071 1
32296132 196607 1 196607 1
32406913 98303 1 98303 1
40163331 7700479 1 7700479 1
40237216 65535 1 65535 1
47234636 6946815 1 6946815 1
47387143 131071 1 131071 1
47439431 31776 1 31776 1
47439440 0 1 0 1
PK_tb_B Index fragmentation
IDX_tb_B_Item_ID
Is there any best practices where I can apply and make this execution duration stable?
Hope you can help me !!!
Thanks in advance...
It's probably that the problem is the DbType of the clustered index. Clustered indexes store the data in the table based on the key values. By default, your primary key is created with a clustered index. This is often the best place to have it,
but not always. If you have, for example, a clustered index over a NVARCHAR column, every time that an INSERT is performed, needs to find the right place to insert the new record. For example, if your table have one million rows, with registers ordered alphabetically, and your new register starts with A, then your clustered index needs to move registers from B to Z to put your new register in the A group. If your new register stars with Z, then moves a smaller number of records, but this doesn't mean that its fine too. If you don´t have a column that let you insert new register sequentially, then you can create an identity column for this or have another column that logically is sequential to any transaction entered regardless of the system, for example, a datetime column that registers the time at the insert ocurrs.
If you want more info, please check this Microsoft documentation
I have a query analogous to:
update x
set x.y = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
)
from mytable x
the point being, I'm iterating over rows and updating a field based on a subquery over those fields which are changing.
What I'm seeing is the subquery is being executed for each row before any updates occur, so the changed values for each row are not being picked up.
How can I force the subquery to be re-evaluated for each row of the update?
Is there a suitable table hint or something?
As an aside, I was doing the below and it did work, however since modifying my query somewhat (for logic purposes, not to try and solve this issue) this trick no longer works :(
declare #temp int
update x
set #temp = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
),
x.y = #temp
from mytable x
I'm not particularly concerned about performance, this is a background task run over a few rows
It looks like task is incorrect or other rules should apply.
Let's see on example. Let's say you have values 4, 1, 2, 3, 1, 2
Sql will update rows based on original values. I.e. during single update statement newly calculated values is NOT mixing with original values:
-- only original values used
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 2 (1+1)
3 -> 6 (1+2+1+2)
1 -> null
2 -> 2 (1+1)
Based on your request you wants that update of each rows will count newly calculated values. (Note, that SQL does not guarantees the sequence in which rows will be processed.)
Let's do this calculation by processing rows from top to bottom:
-- from top
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 1 (1)
3 -> 4 (1+1+2)
1 -> null
2 -> 1 (1)
Do the same in other sequence - from bottom to top:
-- from bottom
4 -> 3 (2+1)
1 -> null
2 -> 1 (1)
3 -> 5 (2+2+1)
1 -> null
2 -> 2 (1+1)
How you can see your expected result is inconsistent. To make it right you need to correct the calculation rule - for instance define strong sequence of the rows to process (date, id, ...)
Also, if you want to do some recursive processing look at the common_table_expression:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
Sequence:
CREATE SEQUENCE STG.TEMP_PPC_SEQ AS BIGINT
START WITH 1
INCREMENT BY 1
NO MINVALUE
MAXVALUE 2147483647
NO CYCLE;
Select Query:
SELECT TPLCST.code,NEXT VALUE FOR STAGING.STG.TEMP_PPC_SEQ
FROM TEMP_PRODUCT_LIFE_CYCLE_STATUS_TYPE TPLCST
Result:
CODE NEXTVAL
30 8
80 10008
40 30008
50 40015
10 40016
20 20008
I am getting nextval random. How can I make them sequential?
They are random because Netezza is massively parallel and each SPU gets its own block of sequence values.
you can use Row_number() over(order by <>) to get sequence value
please read below
http://www.enzeecommunity.com/message/8914
http://www.enzeecommunity.com/message/3272
Are you attempting to make an autoincrement column? Netezza doesn’t do that. Moreover if you use a sequence generator it will be unique but not contiguous – there will be holes in the sequence.
How do I rewrite this T-SQL code to produce the same results
SELECT ACC.Title,
ACC.AdvertiserHierarchyId,
1 AS Counter
FROM admanAdvertiserHierarchy_tbl ACC
JOIN dbo.admanAdvertiserObjectType_tbl AOT ON AOT.AdvertiserObjectTypeId = ACC.AdvertiserObjectTypeId
WHERE (EXISTS
(SELECT 1
FROM dbo.admanAdvertiserHierarchy_tbl CAMP
JOIN dbo.admanAdvertiserAdGroup_tbl AG ON CAMP.AdvertiserHierarchyId = AG.AdvertiserHierarchyId
JOIN dbo.admanAdvertiserCreative_tbl AC ON AC.AdvertiserAdGroupId = AG.AdvertiserAdGroupId
AND CAMP.ParentAdvertiserHierarchyId = ACC.AdvertiserHierarchyId
WHERE CAMP.ERROR = 0
AND AC.Dirty & 7 > 0
AND AC.ERROR = 0
AND AG.ERROR = 0 ))
its preventing the optimizer from using indexes efficiently .
trying to achieve the following results
Title AdvertiserHierarchyId Counter
trcom65#travelrepublic.co.uk 15908 1
paul570#travelrepublic.co.uk 37887 1
es88#travelrepublic.co.uk 37383 1
it004#travelrepublic.co.uk 27006 1
011 10526 1
013 10528 1
033 12013 1
062 17380 1
076 20505 1
this is a count of the dirty tinyint column
Dirty total
0 36340607
1 117569
2 873553
3 59
that links to a static reason table
DirtyReasonId Title
0 Nothing
1 Overnight Engine
2 End To End
3 Overnight And End To End
4 Pause Resume
5 Overnight Engine and Paused
6 Overnight Engine E2E and Paused
7 All Three
If you are asking specifically about the use of the BITWISE AND operator, I believe you are correct, and it's unlikely that SQL Server sees that as sargable, at least, not with an index with Dirty as a leading column.
You are showing only the lowest two bits in use (maximum value of Dirty is 3), yet you are testing the lowest three bits.
So, AC.Dirty > 0 would return an equivalent result, given that 3 is largest value of Dirty. But there is a possibility that other (higher-order) bits are set, for example Dirty could be set to 8. So, if the intent is to check ONLY the lowest three bits, then we need to ensure that we test only the three lowest-order bits. This expression would do that, and one of the predicates is sargable:
( AC.Dirty > 0 AND AC.Dirty % 8 > 0 )
This basically tests first whether ANY bits in AC.Dirty are set, and then checks if any of the last three bits are set. (We're using the MODULO division operator to return the remainder of AC.Dirty divided by 8, which will of course return an integer value between 0 and 7. If we get a zero, then we know that none of the lower three bits are set, else we know at least one of the bits is set.
Just to be clear: the predicate on AC.Dirty > 0 is redundant. It's included here in case you are wanting to make sure that database can at least consider using an existing index with Dirty as a leading column.
I will mention that another option to consider would be adding a persisted COMPUTED COLUMN on the expression, and create an index on it. But that seems a bit overkill for what you need here.
If you are asking specifically about getting an index used on table admanAdvertiserCreative_tbl (AC), then likely your best candidate would be covering index on (AdvertiserAdGroupId, Error, Dirty).
The SQL rewrite below should return equivalent results, perhaps with better performance (depending on your data distribution, indexes, et al.)
Basically, replace the EXISTS (correlated subquery) with a JOIN to a subquery. The subquery returns distinct values of CAMP.ParentAdvertiserHierarchyId, which is the column you referenced to correlate the subquery.
This may or may not make use of any indexes, depending on what indexes are available. (It's likely have clustered unique indexes on the primary keys, and have non-clustered indexes on the foreign keys, which should help join performance.)
Untested:
SELECT ACC.Title,
ACC.AdvertiserHierarchyId,
1 AS Counter
FROM admanAdvertiserHierarchy_tbl ACC
JOIN dbo.admanAdvertiserObjectType_tbl AOT
ON AOT.AdvertiserObjectTypeId = ACC.AdvertiserObjectTypeId
JOIN (SELECT CAMP.ParentAdvertiserHierarchyId
FROM dbo.admanAdvertiserHierarchy_tbl CAMP
JOIN dbo.admanAdvertiserAdGroup_tbl AG
ON CAMP.AdvertiserHierarchyId = AG.AdvertiserHierarchyId
JOIN dbo.admanAdvertiserCreative_tbl AC
ON AC.AdvertiserAdGroupId = AG.AdvertiserAdGroupId
WHERE CAMP.ERROR = 0
AND ( AC.Dirty > 0 AND AC.Dirty % 8 > 0 )
AND AC.ERROR = 0
AND AG.ERROR = 0 )
GROUP BY CAMP.ParentAdvertiserHierarchyId
) c
ON c.ParentAdvertiserHierarchyId = ACC.AdvertiserHierarchyId
I want to create a table in MS SQL Server 2005 to record details of certain system operations. As you can see from the table design below, every column apart from Details is is non nullable.
CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[Details] [varchar](max) NULL
)
Because the Details column won't always have data in it. Is it more efficient to store this column in a separate table and provide a link to it instead?
CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[DetailID] [int] NULL
)
CREATE TABLE [Detail]
(
[DetailID] [int] IDENTITY(1,1) NOT NULL,
[Details] [varchar](max) NOT NULL
)
For a smaller data type I wouldn't really consider it, but for a varchar(max) does doing this help keep the table size smaller? Or I am just trying to out smart the database and achieving nothing?
Keep it inline. Under the covers SQL Server already stores the MAX columns in a separate 'allocation unit' since SQL 2005. See Table and Index Organization. This in effect is exactly the same as keeping the MAX column in its own table, but w/o any disadvantage of explicitly doing so.
Having an explicit table would actually be both slower (because of the foreign key constraint) and consume more space (because of the DetaiID duplication). Not to mention that it requires more code, and bugs are introduced by... writing code.
alt text http://i.msdn.microsoft.com/ms189051.3be61595-d405-4b30-9794-755842d7db7e(en-us,SQL.100).gif
Update
To check the actual location of data, a simple test can show it:
use tempdb;
go
create table a (
id int identity(1,1) not null primary key,
v_a varchar(8000),
nv_a nvarchar(4000),
m_a varchar(max),
nm_a nvarchar(max),
t text,
nt ntext);
go
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('v_a', N'nv_a', 'm_a', N'nm_a', 't', N'nt');
go
select %%physloc%%,* from a
go
The %%physloc%% pseudo column will show the actual physical location of the row, in my case it was page 200:
dbcc traceon(3604)
dbcc page(2,1, 200, 3)
Slot 0 Column 2 Offset 0x19 Length 3 Length (physical) 3
v_a = v_a
Slot 0 Column 3 Offset 0x1c Length 8 Length (physical) 8
nv_a = nv_a
m_a = [BLOB Inline Data] Slot 0 Column 4 Offset 0x24 Length 3 Length (physical) 3
m_a = 0x6d5f61
nm_a = [BLOB Inline Data] Slot 0 Column 5 Offset 0x27 Length 8 Length (physical) 8
nm_a = 0x6e006d005f006100
t = [Textpointer] Slot 0 Column 6 Offset 0x2f Length 16 Length (physical) 16
TextTimeStamp = 131137536 RowId = (1:182:0)
nt = [Textpointer] Slot 0 Column 7 Offset 0x3f Length 16 Length (physical) 16
TextTimeStamp = 131203072 RowId = (1:182:1)
All column values but the TEXT and NTEXT were stored inline, including the MAX types.
After changing the table options and insert a new row (sp_tableoption does not affect existing rows), the MAX types were evicted into their own storage:
sp_tableoption 'a' , 'large value types out of row', '1';
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('2v_a', N'2nv_a', '2m_a', N'2nm_a', '2t', N'2nt');
dbcc page(2,1, 200, 3);
Note how m_a and nm_a columns are now a Textpointer into the LOB allocation unit:
Slot 1 Column 2 Offset 0x19 Length 4 Length (physical) 4
v_a = 2v_a
Slot 1 Column 3 Offset 0x1d Length 10 Length (physical) 10
nv_a = 2nv_a
m_a = [Textpointer] Slot 1 Column 4 Offset 0x27 Length 16 Length (physical) 16
TextTimeStamp = 131268608 RowId = (1:182:2)
nm_a = [Textpointer] Slot 1 Column 5 Offset 0x37 Length 16 Length (physical) 16
TextTimeStamp = 131334144 RowId = (1:182:3)
t = [Textpointer] Slot 1 Column 6 Offset 0x47 Length 16 Length (physical) 16
TextTimeStamp = 131399680 RowId = (1:182:4)
nt = [Textpointer] Slot 1 Column 7 Offset 0x57 Length 16 Length (physical) 16
TextTimeStamp = 131465216 RowId = (1:182:5)
For completion sakeness we can also force the one of the non-max fields out of row:
update a set v_a = replicate('X', 8000);
dbcc page(2,1, 200, 3);
Note how the v_a column is stored in the Row-Overflow storage:
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
v_a = [BLOB Inline Root] Slot 0 Column 2 Offset 0x19 Length 24 Length (physical) 24
Level = 0 Unused = 99 UpdateSeq = 1
TimeStamp = 1098383360
Link 0
Size = 8000 RowId = (1:176:0)
So, as other have already commented, the MAX types are stored inline by default, if they fit. For many DW projects this would be unnacceptable because the typical DW loads must scan or at least range scan, so the sp_tableoption ..., 'large value types out of row', '1' should be used. Note that this does not affect existing rows, in my test not even on index rebuild, so the option has to be turned on early.
For most OLTP type loads though the fact that MAX types are stored inline if possible is actually an advantage, since the OLTP access pattern is to seek and the row width makes little impact on it.
None the less, regarding the original question: separate table is not necessary. Turning on the large value types out of row option achieves the same result at a free cost for development/test.
Paradoxically, if your data is normally less than 8000 characters, I would store it in a separate table, while if the data is greater than 8000 characters, I would keep it in the same table.
This is because what happens is that SQL Server keeps the data in the page if it allows the row to sit in single page, but when the data gets larger, it moves it out just like the TEXT data type and leaves just a pointer in the row. So for a bunch of 3000 character rows, you are fitting less rows per page, which is really inefficient, but for a bunch of 12000 character rows, the data is out of the row, so it's actually more efficient.
Having said this, typically you have a wide ranging mix of lengths and thus I would move it into its own table. This gives you flexibility for moving this table to a different file group etc.
Note that you can also specify it to force the data out of the row using the sp_tableoption. varchar(max) is basically similar to the TEXT data type with it defaulting to data in row (for varchar(max)) instead of defaulting to data out of row (for TEXT).
You should structure your data into whatever seems the most logical structure and allow SQL Server to perform its optimizations as to how to physically store the data.
If you find, through performance analysis, that your structure is a performance problem, then consider performing changes to your structure or to storage settings.
Keep it inline. The whole point of varchar is that it takes up 0 bytes if it's empty, 4 bytes for 'Hello', and so on.
I would normalize it by creating the Detail table. I assume some of the entries in Log will have the same Detail? So if you normalize it you will only be storing an FK id INTEGER instead of the text for every occurrence if you stored the text on the Detail table. If you have reasons to de-normalize do it, but from your question I don't see that being the case.
Having a nullable column costs 2 bytes for every 16 of them. If this is the only (or 17th, or 33nd, etc) nullable column in the table, it will cost you 2 bytes per row, otherwise nothing.