TempTable has columns RunningTotal and ClientCount, we also have #RunningTotal variable declared and set to 0.
Can someone please explain what does this line do ?
UPDATE Temptable
SET #RunningTotal = RunningTotal = #RunningTotal + ClientCount
Never seen this construct before, but it seems to work like this.
It fills column RunningTotal with a cumulative total of ClientCount.
Say we start with a table with just ClientCount filled in:
CREATE TABLE dbo.Temptable (ClientCount int, RunningTotal int)
INSERT INTO Temptable (ClientCount) VALUES (5), (4), (6), (2)
SELECT * FROM Temptable
ClientCount RunningTotal
----------- ------------
5 NULL
4 NULL
6 NULL
2 NULL
And then run the update statement:
DECLARE #RunningTotal int = 0
UPDATE Temptable SET #RunningTotal = RunningTotal = #RunningTotal + ClientCount
SELECT * FROM Temptable
ClientCount RunningTotal
----------- ------------
5 5
4 9
6 15
2 17
As you can see, each value of RunningTotal is the sum of all ClientCount values from the current and any preceding records.
The downside is, you have no control in which order the records are processed. Which makes me wonder whether this is a recommended approach in a production environment.
Please check here for a deeper discussion:
Calculate a Running Total in SQL Server
Related
I need to process rows in a table in batches of not less than N rows. Each batch needs to contain an entire group of rows (group is just another column) i.e. when I select top N rows from the table for processing, I need to extend that N to cover the last group in the batch rather than splitting the last group between batches.
Sample data:
CREATE TABLE test01 (id INT PRIMARY KEY CLUSTERED IDENTITY(1, 1) NOT NULL
, person_name NVARCHAR(100)
, person_surname NVARCHAR(100)
, person_group_code CHAR(2) NOT NULL);
INSERT INTO
dbo.test01 (person_name
, person_surname
, person_group_code)
VALUES
('n1', 's1', 'g1')
, ('n2', 's2', 'g1')
, ('n3', 's3', 'g1')
, ('n4', 's4', 'g1')
, ('n5', 's5', 'g2')
, ('n6', 's6', 'g2')
, ('n7', 's7', 'g2')
, ('n8', 's8', 'g2')
, ('n9', 's9', 'g2')
, ('n10', 's10', 'g2')
, ('n11', 's11', 'g3')
, ('n12', 's12', 'g3')
, ('n13', 's13', 'g3')
, ('n14', 's14', 'g3');
My current attempt:
DECLARE #batch_start INT = 1
, #batch_size INT = 5;
DECLARE #max_id INT = (SELECT MAX(id) FROM dbo.test01);
WHILE #batch_start <= #max_id
BEGIN
SELECT *
FROM dbo.test01
WHERE id BETWEEN #batch_start AND #batch_start + #batch_size - 1;
SELECT #batch_start += #batch_size;
END;
DROP TABLE dbo.test01;
In the example above, I am splitting the 14 rows into 3 batches: 5 rows in batch #1, another 5 rows in batch #2 and then 4 rows in the final batch.
The first batch (id from 1 to 5) covers only fraction of the 'g2' group so I need to extend this batch to cover rows 1-10 (I need to process the entire g2 in a single batch).
(by the way, I don't mind batch upsizing - I need to make sure I cover at least one full group per batch).
The result would be that batch #1 would cover groups g1 and g2 (10 rows) then batch #2 would cover group g3 (4 rows) and there would be no batch #3 at all.
Now, the table is billions of rows and batch sizes are around 50K-100K each so I need a solution that performs well.
Any hints on how to approach this with minimal performance hit?
The first thing I've noticed is that your current code assumes no gaps in the identity column - However that is a mistake. An identity column may (and often do) have gaps in the numbers - so the first thing you want to do is use row_number() over(order by id) to get a continuous running number for all your records.
The second thing I've added as a column that gave a numeric id for each group ordered by the same order as the identity column - using a well-known technique for solving gaps and islands problems.
I've used a table variable to store this data for each id on the source table for the purpose if this demonstration, but you might want to use a temporary table and add indexes on the relevant columns to improve performance.
I've also renamed your #batch_size variable to #batch_min_size and added a few other variables.
So here is the table variable I've used:
DECLARE #Helper As Table (Id int, Rn int, GroupId int)
INSERT INTO #Helper (Id, Rn, GroupId)
SELECT Id,
ROW_NUMBER() OVER(ORDER BY ID) As Rn,
ROW_NUMBER() OVER(ORDER BY ID) -
ROW_NUMBER() OVER(PARTITION BY person_group_code ORDER BY ID) As GroupId
FROM dbo.test01
This is the content of this table:
Id Rn GroupId
1 1 0
2 2 0
3 3 0
4 4 0
5 5 4
6 6 4
7 7 4
8 8 4
9 9 4
10 10 4
11 11 10
12 12 10
13 13 10
14 14 10
I've used a while loop to do the batches.
In the loop, I've used this table to calculate the first and last id of each batch, as well as the last row number of the batch.
Then all I had to do was to use the first and last id in the where clause of the original table:
DECLARE #batch_min_size int = 10
, #batch_end int = 0
, #batch_start int
, #first_id_of_batch int
, #last_id_of_batch int
, #total_row_count int;
SELECT #total_row_count = COUNT(*) FROM #test01
WHILE #batch_end < #total_row_count
BEGIN
SELECT #batch_start = #batch_end + 1;
SELECT #batch_end = MAX(Rn)
, #first_id_of_batch = MIN(Id)
, #last_id_of_batch = MAX(Id)
FROM #Helper
WHERE Rn >= #batch_start
AND GroupId <=
(
SELECT MAX(GroupId)
FROM #Helper
WHERE Rn <= #batch_start + #batch_min_size - 1
)
SELECT id, person_name, person_surname, person_group_code
FROM dbo.test01
WHERE Id >= #first_id_of_batch
AND Id <= #last_id_of_batch
END
See a live demo on rextester.
See if below helps:
CREATE TABLE #Temp(g_record_count int, groupname varchar(50) )
insert into #Temp(g_record_count,groupname) SELECT MAX(id),person_group_code FROM dbo.test01 group by person_group_code
After this loop through this temporary table :
DECLARE #rec_per_batch INT = 1
WHILE #batch_start <= #max_id
BEGIN
select min(g_record_count) into #rec_per_batch from #temp where g_record_count>=#batch_size * #batch_start;
SELECT *
FROM dbo.test01
WHERE id BETWEEN #batch_start AND #rec_per_batch;
SELECT #batch_start += #batch_size;
END;
Anyone know an efficient way to write a query to compare a current month's revenue to the average monthly revenue for the past 6 months?
Here's an example with just 2 columns, the actual month and the month's revenue
Columns:
MonthYear RevenueAmt
Jan2017 120
Dec2016 75
Nov2016 50
Oct2016 100
Sep2016 75
Aug2016 100
Jul2016 100
so....the average of the previous 6 months (Jul to Dec) is
(75 + 50 + 100 + 75 + 100 + 100) = 500
500 / 6 = 83.33
The current month (Jan2017) is 120,
so the difference becomes-
120 - 83.33 = 36.67
So, Jan2017 is 36.67 higher than the average of its past 6 months.
You can use the window functions and set the frame via ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
This is a rolling variance, and I did make one modification... I used an actual date so we can set the proper Order By in the Over clause
Edit: I added the Prior6MthAvg column to illustrate the math
Declare #YourTable table (MonthYear Date,RevenueAmt int)
Insert Into #YourTable values
('2017-01-01',120),
('2016-12-01',75),
('2016-11-01',50),
('2016-10-01',100),
('2016-09-01',75),
('2016-08-01',100),
('2016-07-01',100)
Select A.*
,Prior6MthAvg = avg(RevenueAmt+0.0) over (Order By MonthYear ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
,Variance = RevenueAmt-avg(RevenueAmt+0.0) over (Order By MonthYear ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
From #YourTable A
Order by MonthYear Desc
Returns
MonthYear RevenueAmt Prior6MthAvg Variance
2017-01-01 120 83.333333 36.666667
2016-12-01 75 85.000000 -10.000000
2016-11-01 50 93.750000 -43.750000
2016-10-01 100 91.666666 8.333334
2016-09-01 75 100.000000 -25.000000
2016-08-01 100 100.000000 0.000000
2016-07-01 100 NULL NULL
I was going to comment, but it is more of an answer....
Please note: efficiency, and planning thereof, requires a broader scope of design. That said, in general - assuming a larger amount of data - the most optimal way (in my experience) is to keep a separate table for the "running averages."
To explain, I would keep a separate table (single record, if only 1 set of data - i.e. not company based), keyed by current month, with the given average AND the 6th month total.
Once done - insert a trigger for adding a new month (again, in general, or by company, if divided) that will subtract the 6th month, and add the current.
Then, either the total, average or both are stored separately, and quickly accessed.
Once done - a simple join will bring the total/average into any given query.
Edited to add: (NOTE: my T-SQL may be rusty, but this should give you the idea)
NOTE: it assumes the base table of ClientData, with a Secondary Table SalesAvg, linked by id (of the rep) and the date (the month marker, if the date can vary, you would need to split out month/year for the key and link). Pulling it from the same table as given will basically task the server at the point of query. This method distributes the work to the point of insertion (normally more spread out) and as the average is keyed, allows for quickest retrieval using an inner join.
CREATE TRIGGER UpdateSalesAvg
ON schema.ClientData
AFTER INSERT
AS
DECLARE #ID as INT;
DECLARE #Date as DATE;
DECLARE #Count as INT;
SET #ID=new.id;
SET #Date=new.date;
SELECT #Count = ISNULL(RunCount, 0) FROM schema.SalesAvg
WHERE id = #ID AND monthdate = #Date;
IF (#Count = 0)
BEGIN
INSERT INTO schema.SalesAvg (id, monthdate, RunAvg, RunCount, LastPost)
VALUES (new.id, new.monthdate, new.value, 1, new.value);
END
ELSE IF (#Count = 6)
BEGIN
UPDATE schema.SalesAvg
SET RunAvg = ROUND((RunAvg * 6) - LastPost + new.value) / 6, 2),
LastPost = new.value
WHERE id = #ID
AND monthdate = #Date;
END
ELSE
BEGIN
UPDATE schema.SalesAvg
SET RunAvg = ROUND( (RunAvg * RunCount + new.value) / (RunCount + 1), 2 ),
RunCount = RunCount + 1,
LastPost = new.value
WHERE id = #ID
AND monthdate= #Date;
END
GO
I am struggling with the following:
Counter --- Period ---
1 2012-02-09
1 2012-02-09
1 2012-02-08
2 2012-02-07
2 2012-02-07
2 2012-02-07
3 2012-02-06
3 2012-02-06
I don't know what function to use or how to add a counter column that will divide the period rows in the table by 3 and add a counts. It will divide until it can and assign the leftover rows as the next counter (as shown above). In the example above #n is 3 so it counts each period assigns 1 until 3 are complete and counters.
I have looked at NTILE that does not work as it just divides it into n groups.
Help will be greatly appreciated.
It's possible you need to clarify your question; if I use NTILE() I get the result you're looking for (if you include an ID):
declare #tableA table(id int identity, col1 date)
insert into #tableA values ('2012-02-09')
insert into #tableA values ('2012-02-09')
insert into #tableA values ('2012-02-08')
insert into #tableA values ('2012-02-07')
insert into #tableA values ('2012-02-07')
insert into #tableA values ('2012-02-07')
insert into #tableA values ('2012-02-06')
insert into #tableA values ('2012-02-06')
select ntile(3) over (order by id) counter, col1 Period from #tableA
Results:
counter Period
-------------------- ----------
1 2012-02-09
1 2012-02-09
1 2012-02-08
2 2012-02-07
2 2012-02-07
2 2012-02-07
3 2012-02-06
3 2012-02-06
Are you looking for something like:
declare #n as int = 3
SELECT
((ROW_NUMBER() over (order by period desc) - 1) / #n) + 1 as counter,
[period]
FROM [a].[dbo].[a]
I have one table (Stock_ID, Stock_Name). I want to write a stored procedure in SQL Server with Stock_ID running number with a format like xxxx/12 (xxxx = number start from 0001 to 9999; 12 is the last 2 digits of current year).
My scenario is that if the year change, the running number will be reset to 0001/13.
what do you intend to do when you hit more than 9999 in a single year??? it may sound impossible, but I've had to deal with so many "it will never happen" data related design mess-ups over the years from code first design later developers. These are major pains depending on how may places you need to fix these items which are usually primary key and foreign keys used all over.
This looks like a system requirement to SHOW the data this way, but it is the developers responsibility to design the internals of the application. The way you store it and display it don't need to be identical. I'd split that into two columns, using an int for the number portion and a tiny int for the 2 digit year portion. You can use a computed column for quick and easy display (persist it and index if necessary), where you pad with leading zeros and add the slash. Throw in a check constraint on the year portion to make sure it stays within a reasonable range. You can make the number portion an identity and just have a job reseed it back to 1 every new years eve.
try it out:
--drop table YourTable
--create the basic table
CREATE TABLE YourTable
(YourNumber int identity(1,1) not null
,YourYear tinyint not null
,YourData varchar(10)
,CHECK (YourYear>=12 and YourYear<=25) --optional check constraint
)
--add the persisted computed column
ALTER TABLE YourTable ADD YourFormattedNumber AS ISNULL(RIGHT('0000'+CONVERT(varchar(10),YourNumber),4)+'/'+RIGHT(CONVERT(varchar(10),YourYear),2),'/') PERSISTED
--make the persisted computed column the primary key
ALTER TABLE YourTable ADD CONSTRAINT PK_YourTable PRIMARY KEY CLUSTERED (YourFormattedNumber)
sample data:
--insert rows in 2012
insert into YourTable values (12,'aaaa')
insert into YourTable values (12,'bbbb')
insert into YourTable values (12,'cccc')
--new years eve job run this
DBCC CHECKIDENT (YourTable, RESEED, 0)
--insert rows in 2013
insert into YourTable values (13,'aaaa')
insert into YourTable values (13,'bbbb')
select * from YourTable order by YourYear,YourNumber
OUTPUT:
YourNumber YourYear YourData YourFormattedNumber
----------- -------- ---------- -------------------
1 12 aaaa 0001/12
2 12 bbbb 0002/12
3 12 cccc 0003/12
1 13 aaaa 0001/13
2 13 bbbb 0002/13
(5 row(s) affected)
to handle the possibility of more than 9999 rows per year try a different computed column calculation:
CREATE TABLE YourTable
(YourNumber int identity(9998,1) not null --<<<notice the identity starting point, so it hits 9999 quicker for this simple test
,YourYear tinyint not null
,YourData varchar(10)
)
--handles more than 9999 values per year
ALTER TABLE YourTable ADD YourFormattedNumber AS ISNULL(RIGHT(REPLICATE('0',CASE WHEN LEN(CONVERT(varchar(10),YourNumber))<4 THEN 4 ELSE 1 END)+CONVERT(varchar(10),YourNumber),CASE WHEN LEN(CONVERT(varchar(10),YourNumber))<4 THEN 4 ELSE LEN(CONVERT(varchar(10),YourNumber)) END)+'/'+RIGHT(CONVERT(varchar(10),YourYear),2),'/') PERSISTED
ALTER TABLE YourTable ADD CONSTRAINT PK_YourTable PRIMARY KEY CLUSTERED (YourFormattedNumber)
sample data:
insert into YourTable values (12,'aaaa')
insert into YourTable values (12,'bbbb')
insert into YourTable values (12,'cccc')
DBCC CHECKIDENT (YourTable, RESEED, 0) --new years eve job run this
insert into YourTable values (13,'aaaa')
insert into YourTable values (13,'bbbb')
select * from YourTable order by YourYear,YourNumber
OUTPUT:
YourNumber YourYear YourData YourFormattedNumber
----------- -------- ---------- --------------------
9998 12 aaaa 9998/12
9999 12 bbbb 9999/12
10000 12 cccc 10000/12
1 13 aaaa 0001/13
2 13 bbbb 0002/13
(5 row(s) affected)
This might help:
DECLARE #tbl TABLE(Stock_ID INT,Stock_Name VARCHAR(100))
INSERT INTO #tbl
SELECT 1,'Test'
UNION ALL
SELECT 2,'Test2'
DECLARE #ShortDate VARCHAR(2)=RIGHT(CAST(YEAR(GETDATE()) AS VARCHAR(4)),2)
;WITH CTE AS
(
SELECT
CAST(ROW_NUMBER() OVER(ORDER BY tbl.Stock_ID) AS VARCHAR(4)) AS RowNbr,
tbl.Stock_ID,
tbl.Stock_Name
FROM
#tbl AS tbl
)
SELECT
REPLICATE('0', 4-LEN(RowNbr))+CTE.RowNbr+'/'+#ShortDate AS YourColumn,
CTE.Stock_ID,
CTE.Stock_Name
FROM
CTE
From memory, this is a way to get the next id:
declare #maxid int
select #maxid = 0
-- if it does not have #maxid will be 0, if it was it will give the next id
select #maxid = max(convert(int, substring(Stock_Id, 1, 4))) + 1
from table
where substring(Stock_Id, 6, 2) = substring(YEAR(getdate()), 3, 2)
declare #nextid varchar(7)
select #nextid = right('0000'+ convert(varchar,#maxid),4)) + '/' + substring(YEAR(getdate()), 3, 2)
I have a table that keeps track of transactions for various accounts:
AccountTransactions
AccountTransactionID int NOT NULL (PK)
AccountID int NOT NULL (FK)
Amount decimal NOT NULL
Upon inserting a record with a negative amount into this table, I need to verify that the SUM of the amount column for the specified account is greater than zero. If the new record will cause this SUM to fall below zero, the record should not be inserted.
For example, if I have the following records, inserting an amount of -8.00 for AccountID 5 should not be allowed:
AccountTransactionID AccountID Amount
---------------------------------------------
1 5 10.00
2 6 15.00
3 5 -3.00
What is the best method to accomplish this? Check constraint, trigger, or just check for this condition in a stored procedure?
You can do a simple check:
DECLARE #TheSum decimal(18,2)
SET #TheSum = (SELECT SUM(MyCol) FROM MyTable WHERE AccountID = #SomeParameter)
If #TheSum > 0
BEGIN
--do your insert
END
...
You could add a where clause to your insert:
insert YourTable
(AccountID, Amount)
select #AccountID, #Amount
where 0 <=
(
select #Amount + sum(Amount)
from YourTable
where AccountID = #AccountID
)