SQL Server - poor performance during Insert transaction - sql-server

I have a stored procedure which executes a query and return the line into variables like below:
SELECT #item_id = I.ID, #label_ID = SL.label_id,
FROM tb_A I
LEFT JOIN tb_B SL ON I.ID = SL.item_id
WHERE I.NUMBER = #VAR
I have a IF to check if #label_ID is null or not. If it is null, it goes to INSERT statement, otherwise it goes to UPDATE statement. Let's focus on INSERT where I know I'm having problems. The INSERT part is like below:
IF #label_ID IS NULL
BEGIN
INSERT INTO tb_B (item_id, label_qrcode, label_barcode, data_leitura, data_inclusao)
VALUES (#item_id, #label_qrcode, #label_barcode, #data_leitura, GETDATE())
END
So, tb_B has a PK in ID column and a FK in item_ID column which refers to column ID in tb_A table.
I ran SQL Server Profiler and I saw that sometimes the duration for this stored procedure takes around 2300ms and the normal average for this is 16ms.
I ran the "Execution Plan" and the biggest cost is in the "Clustered Index Insert" component. Showing below:
Estimated Execution Plan
Actual Execution Plan
Details
More details about the tables:
tb_A Storage:
Index space: 6.853,188 MB
Row count: 45988842
Data space: 5.444,297 MB
tb_B Storage:
Index space: 1.681,688 MB
Row count: 15552847
Data space: 1.663,281 MB
Statistics for INDEX 'PK_tb_B'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PK_tb_B Sep 23 2018 2:30AM 15369616 15369616 5 1 4 NO 15369616
All Density Average Length Columns
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6.506343E-08 4 id
Histogram Steps
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 0 1 0 1
8192841 8192198 1 8192198 1
8270245 65535 1 65535 1
15383143 7111878 1 7111878 1
15383144 0 1 0 1
Statistics for INDEX 'IDX_tb_B_ITEM_ID'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IDX_tb_B_ITEM_ID Sep 23 2018 2:30AM 15369616 15369616 12 1 7.999424 NO 15369616
All Density Average Length Columns
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6.50728E-08 3.999424 item_id
6.506343E-08 7.999424 item_id, id
Histogram Steps
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 2214 0 1
16549857 0 1 0 1
29907650 65734 1 65734 1
32097131 131071 1 131071 1
32296132 196607 1 196607 1
32406913 98303 1 98303 1
40163331 7700479 1 7700479 1
40237216 65535 1 65535 1
47234636 6946815 1 6946815 1
47387143 131071 1 131071 1
47439431 31776 1 31776 1
47439440 0 1 0 1
PK_tb_B Index fragmentation
IDX_tb_B_Item_ID
Is there any best practices where I can apply and make this execution duration stable?
Hope you can help me !!!
Thanks in advance...

It's probably that the problem is the DbType of the clustered index. Clustered indexes store the data in the table based on the key values. By default, your primary key is created with a clustered index. This is often the best place to have it,
but not always. If you have, for example, a clustered index over a NVARCHAR column, every time that an INSERT is performed, needs to find the right place to insert the new record. For example, if your table have one million rows, with registers ordered alphabetically, and your new register starts with A, then your clustered index needs to move registers from B to Z to put your new register in the A group. If your new register stars with Z, then moves a smaller number of records, but this doesn't mean that its fine too. If you donĀ“t have a column that let you insert new register sequentially, then you can create an identity column for this or have another column that logically is sequential to any transaction entered regardless of the system, for example, a datetime column that registers the time at the insert ocurrs.
If you want more info, please check this Microsoft documentation

Related

how to get number containing rows in snowflake

I have tried regexp, regexp_like and like but didn't work
select * from b
where regexp_like(col1, '\d')
where regexp_like(col1, '[0-9]')
....etc
we have this table
Col1
avr100000
adfdsgwr
20170910020359.761
Enterprise
adf56ds76gwr
0+093000
080000
adfdsgwr
output should be these 5 rows
col1
avr100000
20170910020359.761
adf56ds76gwr
0+093000
1080000
Thanks
You can use regexp_instr in the where clause to see if it finds a digit anywhere in the string:
create temp table b(col1 string);
insert into b (col1) values ('avr100000'), ('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('080000'),
('adfdsgwr')
;
select col1 from b where regexp_instr(col1, '\\d') > 0;
I'm updating my answer to note that regexp_instr is going to perform about 3.8 times faster than using regexp_count for this requirement.
The reason is that regexp_instr will stop and report the location of the first digit it encounters. In contrast, regexp_count will continue examining the string until it reaches its end. If we only want to know if a digit exists in a string, we can stop as soon as we encounter the first one.
If it is a small data set, this won't matter much. For large data sets, that 3.8 times faster makes a big difference. Here is a mini test harness that shows the performance difference:
create or replace transient table RANDOM_STRINGS as
select RANDSTR(50, random()) as RANDSTR from table(generator (rowcount => 10000000));
alter session set use_cached_result = false;
-- Run these statements multiple times on an X-Small warehouse to test performance
-- Run both to warm the cache, then note the times after the initial runs to warm the cache
-- Average over 10 times with warm cache: 3.315s
select count(*) as ROWS_WITH_NUMBERS
from RANDOM_STRINGS where regexp_count(randstr, '\\d') > 0;
-- Average over 10 times with warm cache: 0.8686s
select count(*) as ROWS_WITH_NUMBERS
from RANDOM_STRINGS where regexp_instr(randstr, '\\d') > 0;
One method is to count how many alpha tokens there are:
select column1 as input
,regexp_count(column1, '[A-Za-z]') as alpha_count
from values
('0-100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adfdsgwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
INPUT
ALPHA_COUNT
0-100000
0
adfdsgwr
8
20170910020359.761
0
Enterprise
10
adfdsgwr
8
0+093000
0
1-080000
0
adfdsgwr
8
and thus exclude those where it is not zero:
select column1 as input
from values
('0-100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adfdsgwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
where regexp_count(column1, '[A-Za-z]') = 0
gives:
INPUT
0-100000
20170910020359.761
0+093000
1-080000
All you need to do is find 1 or more instances of a numeric value:
select
column1 as input
from
values
('avr100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
where
regexp_count (column1, '\\d') > 0;
Results:
avr100000
20170910020359.800
adf56ds76gwr
0+093000
1-080000

SQL Equivalent of Countif or Index Match on row level instead of columnar level

I have a table which comprises of 30 columns, all adjacent to one another. Of these 5 are text fields indicating certain details pertaining to that entry and 25 are value fields. Value fields have the column name as Val00, Val01, Val02 .....upto Val24
Based on a logic appearing elsewhere, these value fields input a value for n amount of columns and then drop to 0 for all of the subsequent fields
e.g.
When n is 5 the output will be
Val00
Val01
Val02
Val03
Val04
Val05
Val06
Val07
Val24
1.5
1.5
1.5
1.5
1.5
0
0
0
0
As can be seen, all values starting from val05 will drop to 0 and all columns from Val 05 to Val24 will be 0.
Given this output, I want to find what this n value is and create a new column ValCount to store this value in it.
In Excel this would be fairly straight forward to achieve with the below formula
=COUNTIF(Val00:Val24,">0")
However I'm not sure how we would go about in terms of SQL. I understand that the count function works on a columnar level and not on a row level.
I found a solution which is rather long but it should do the job so I hope it helps you.
SELECT SUM(SUM(CASE WHEN Val00 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val01 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val02 >= 1 THEN 1 ELSE 0 END)) As ValCount

Space consumed by a particular column's data and impact on deleting that column

I am using Oracle 12c database in my project and I have a column "Name" of type "VARCHAR2(128 CHAR) NOT NULL ". I have approximately 25328687 rows in my table.
Now I don't need the "Name" column so I want to delete it. When I calculated the total size of the data in this column(using lengthb and vsize) for all the rows it was approximately 1.07 GB.
Since the max size of the data in this column is specified, isn't all the rows will be allocated 128 bytes for this column (ignoring unicode for simplicity) and the total space consumed by this column should be 128 * number of rows = 3242071936 bytes or 3.24 GB.
Oracle Varchar2 allocate memory dynamically (definition says variable length string data type)
Char datatype is fixed length string data type.
create table x (a char(5), b varchar2(5));
insert into x value ('RAM', 'RAM');
insert into x value ('RAMA', 'RAMA');
insert into x value ('RAMAN', 'RAMAN');
SELECT * FROM X WHERE length(a) = 3; -> this will return 0 record
SELECT * FROM X WHERE length(b) = 3; -> this will return 1 record (RAM)
SELECT length(a) len_a, length(b) len_b from x ;
o/p will be like below
len_a | len_b
-------------
5 | 3
5 | 4
5 | 5
Oracle do dynamic allocation for varchar2 .
So a string of 4 char will take 5 bytes one for the length and 4 bytes for 4 char , if one-byte character set .
As the other answers say, the storage that a VARCHAR2 column uses is VARying. To get an estimate of the actual amount, you can use
1) The data dictionary
SELECT column_name, avg_col_len, last_analyzed
FROM ALL_TAB_COL_STATISTICS
WHERE owner = 'MY_SCHEMA'
AND table_name = 'MY_TABLE'
AND column_name = 'MY_COLUMN';
The result avg_col_len is the average column length. Mulitply it by your number of rows 25328687 and you get an estimate of roughly how many bytes this column uses. (If last_analyzed is NULL or very old compared to the last big data change, you'll have to refresh the optimizer stats with DBMS_STATS.GATHER_TABLE_STATS('MY_SCHEMA','MY_TABLE') first.
2) Count yourself in sample
SELECT sum(s), count(*), avg(s), stddev(s)
FROM (
SELECT vsize(my_column) as s
FROM my_schema.my_table SAMPLE (0.1)
);
This calculates the storage size of a 0.1 percent sample of your table.
3) To know for sure, I'd do a test of with a subset of the data
CREATE TABLE my_test TABLESPACE my_scratch_tablespace NOLOGGING AS
SELECT * FROM my_schema.my_table SAMPLE (0.1);
-- get the size of the test table in megabytes
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- now drop the column
ALTER TABLE my_test DROP (my_column);
-- and measure again
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
-- check how much space will be freed up
ALTER TABLE my_test MOVE;
SELECT round(bytes/1024/1024) as mb
FROM dba_segments WHERE owner='MY_SCHEMA' AND segment_name='MY_TABLE';
You could improve the test by using the same PCTFREE and COMPRESSION levels on your test table.

In SSRS, how can I add a row to aggregate all the rows that don't match a filter?

I'm working on a report that shows transactions grouped by type.
Type Total income
------- --------------
A 575
B 244
C 128
D 45
E 5
F 3
Total 1000
I only want to provide details for transaction types that represent more than 10% of the total income (i.e. A-C). I'm able to do this by applying a filter to the group:
Type Total income
------- --------------
A 575
B 244
C 128
Total 1000
What I want to display is a single row just above the total row that has a total for all the types that have been filtered out (i.e. the sum of D-F):
Type Total income
------- --------------
A 575
B 244
C 128
Other 53
Total 1000
Is this even possible? I've tried using running totals and conditionally hidden rows within the group. I've tried Iif inside Sum. Nothing quite seems to do what I need and I'm butting up against scope issues (e.g. "the value expression has a nested aggregate that specifies a dataset scope").
If anyone can give me any pointers, I'd be really grateful.
EDIT: Should have specified, but at present the dataset actually returns individual transactions:
ID Type Amount
---- ------ --------
1 A 4
2 A 2
3 B 6
4 A 5
5 B 5
The grouping is done using a row group in the tablix.
One solution is to solve that in the SQL source of your dataset instead of inside SSRS:
SELECT
CASE
WHEN CAST([Total income] AS FLOAT) / SUM([Total income]) OVER (PARTITION BY 1) >= 0.10 THEN [Type]
ELSE 'Other'
END AS [Type]
, [Total income]
FROM Source_Table
See also SQL Fiddle
Try to solve this in SQL, see SQL Fiddle.
SELECT I.*
,(
CASE
WHEN I.TotalIncome >= (SELECT Sum(I2.TotalIncome) / 10 FROM Income I2) THEN 10
ELSE 1
END
) AS TotalIncomePercent
FROM Income I
After this, create two sum groups.
SUM(TotalIncome * TotalIncomePercent) / 10
SUM(TotalIncome * TotalIncomePercent)
Second approach may be to use calculated column in SSRS. Try to create a calculated column with above case expression. If it allows you to create it, you may use it in the same way as SQL approach.
1) To show income greater than 10% use row visibility condition like
=iif(reportitems!total_income.value/10<= I.totalincome,true,false)
here reportitems!total_income.value is total of all income textbox value which will be total value of detail group.
and I.totalincome is current field value.
2)add one more row to outside of detail group to achieve other income and use expression as
= reportitems!total_income.value-sum(iif(reportitems!total_income.value/10<= I.totalincome,I.totalincome,nothing))

Ensure data integrity in SQL Server

I have to make some changes in a small system that stores data in one table as following:
TransId TermId StartDate EndDate IsActiveTerm
------- ------ ---------- ---------- ------------
1 1 2007-01-01 2007-12-31 0
1 2 2008-01-01 2008-12-31 0
1 3 2009-01-01 2009-12-31 1
1 4 2010-01-01 2010-12-31 0
2 1 2008-08-05 2009-08-04 0
2 2 2009-08-05 2010-08-04 1
3 1 2009-07-31 2010-07-30 1
3 2 2010-07-31 2011-07-30 0
where the rules are:
StartDate must be the previous
term EndDate + 1 day (terms cannot overlapping)
there are many terms per each transaction
term length is from 1 to n days (I
made 1 year to make it simpler in this example)
NOTE: IsActiveTerm is a computed column which depends on CurentDate so is not deterministic
I need to ensure terms not overlapping. In other words I want to enforce this condition even when inserting/updating a multiple rows.
What I am thinking of is to add an "INSTEAD OF" triggers (for both Insert and Update) but this requires to use cursors as I need to cope with multiple rows.
Does anyone have a better idea?
You can find pretty much everything about temporal databases in: Richard T. Snodgrass, "Developing Time-Oriented Database Applications in SQL", Morgan-Kaufman (2000), which i believe is out of print but can be downloaded via the link on his publication list
I've got working solution:
CREATE TRIGGER TransTerms_EnsureCon ON TransTerms
FOR INSERT, UPDATE, DELETE AS
BEGIN
IF (EXISTS (SELECT *
FROM TransTerms pT
INNER JOIN TransTerms nT
ON pT.TransId= nT.OfferLettingId
AND nT.TransTermId = pT.TransTermId + 1
WHERE nT.StartDate != DATEADD(d, 1, pT.EndDate)
AND pT.EndDate > pT.StartDate
AND nT.EndDate > nT.StartDate
)
)
RAISERROR('Transaction violates sequenced CONSTRAINT', 1, 2)
ROLLBACK TRANSACTION
END
P.S. Many thanks wallenborn!

Resources