My table looks like the one below.
I am doing average for total table. I am getting 14. It is fine.
declare #Table table (Student Varchar(10), Score int)
insert into #Table
select 'A',10
union all
select 'B',20
union all
select 'A',10
union all
select 'C',20
union all
select 'B',10
select avg(cast(Score as float)) AvgScore from #Table
AvgScore
--------
14
select Student, avg(cast(Score as float)) AvgScore from #Table group by Grouping sets(Student,())
Student AvgScore
------------------
A 10
B 15
C 20
NULL 14
If I do average (10+15+20)/3, I am not getting 14.
How can I over come this?
Am I not doing mathematics correct?
Can any give me brief explanation about it.
Thanks in advance.
Total average is for all data so:
(10 + 20 + 10 + 20 + 10) / 5 = 70 / 5 = 14
Everything is ok. You try to calculate average on averages (10+15+20)/3 which is nonsense from Math point of view.
Look at this example:
A - 1
A - 1
A - 1
A - 1
B - 20
Average is (1+1+1+1+20) / 5 and NOT (1+20)/2
The problem is that you reduce the information you have in the two steps of the calculation. Your original is a simple average.
After your reduction you got:
The problem you have is that the weight of each value is different. You got 2 values affecting A, two values affecting B but only 1 value affecting C. And this information, while important for calculating the average, is lost. What you need to do in addition is to get the proper average, is to store the weight of each average. Means the amount of source values. This would be:
Student Value Weight
A 10 2
B 15 2
C 20 1
A weight is simply the count of values for each student. You can extract that easily in one query.
Now your final average calculation should look like this:
Selecting the values you need should look like this I think:
SELECT Student, AVG(CAST(Score as float)) AvgScore, COUNT(*) Weight
FROM #Table
GROUP BY Grouping sets(Student,())
The rest of the path should be clear. Multiply weight and average values and divide it by the sum of the weight value.
Related
Is there a relatively simple way to create rows in a table based on a range of dates?
For example; given:
ID
Date_min
Date_max
1
2022-02-01
2022-20-05
2
2022-02-09
2022-02-12
I want to output:
ID
Date_in_Range
1
2022-02-01
1
2022-02-02
1
2022-02-03
1
2022-02-04
1
2022-02-05
2
2022-02-09
2
2022-02-10
2
2022-02-11
2
2022-02-12
I saw a solution when the range is integer based (How to create rows based on the range of all values between min and max in Snowflake (SQL)?)
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
So, I was wondering if there is a simpler way to do it when dealing with dates instead of integers? Any hints anyone can provide?
Another one without using generator -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
Date_min,
Date_max,
dateadd(day, index, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(day, Date_min, Date_max)-1),','));
SQL with first date -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
dateadd(month, index-1, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(month, Date_min, Date_max)),','));
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
There is no need to convert date to int back and forth, just simple DATEADD('day', num, start_date)
Pseudocode:
WITH sample_data(id, date_min, date_max) AS (
SELECT 1, '2022-02-01'::DATE, '2022-02-05'::DATE
UNION
SELECT 2, '2022-02-09'::DATE, '2022-02-12'::DATE
) , numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY SEQ4())-1 AS num -- 0 based
FROM TABLE(GENERATOR(ROWCOUNT => 1000)) -- should match max anticipated span
)
SELECT s.id, DATEADD(DAY, n.num, s.date_min) AS calculated_date
FROM sample_data AS s
JOIN numbers AS n
ON DATEADD('DAY', n.num, s.date_min) BETWEEN s.date_min AND s.date_max
ORDER BY s.id, calculated_date;
Ouptut:
I have a table my_table of the form
rowNumber number ...
1 23
2 14
3 15
4 25
5 19
6 21
7 19
8 37
9 31
...
1000 28
and I want to find the maximum length of an increasing consecutive sequence of the column number. For this example, it will be 3:
14, 15, 25
My idea is to calculate such length for each number:
rowNumber number ... length
1 23 1
2 14 1
3 15 2
4 25 3
5 19 1
6 21 2
7 19 1
8 37 2
9 31 1
...
and then take the maximum. To calculate length, I wrote the following query that is using recursion:
with enhanced_table as (select *
,1 length
from my_table
where rowNumber = 1
union all
(select b.*
,case when b.number > a.number
then a.length + 1
end new_column
from enhanced_table a, my_table b
where b.rowNumber = a.rowNumber + 1
)
select max(length)
from enhanced_table
So, I'm trying to start from rowNumber = 1 and add all other rows consecutively by recursion. I'm getting the maximum recursion 100 has been exhausted before statement completion error.
My question is: should I find a way to increase maximum iterations allowed on the server (given that the query is simple, I think there won't be a problem to run 1000 iterations), or find another approach?
Also, isn't 100 iterations too low of a threshold?
Thank you!
There has to be some default threshold, and that is what Microsoft chose. It's to prevent infinite loops. Besides, looping doesn't perform well in SQL Server and goes against its set-based structure.
You can specify the max recursion you want to set for the individual query. This overrides the default.
select max(length)
from enhanced_table
option (maxrecursion 1000)
Note, option (maxrecursion 0) is the same as unlimited... and can cause an infinte loop
REFERENCE
An incorrectly composed recursive CTE may cause an infinite loop. For
example, if the recursive member query definition returns the same
values for both the parent and child columns, an infinite loop is
created. To prevent an infinite loop, you can limit the number of
recursion levels allowed for a particular statement by using the
MAXRECURSION hint and a value between 0 and 32,767 in the OPTION
clause of the INSERT, UPDATE, DELETE, or SELECT statement. This lets
you control the execution of the statement until you resolve the code
problem that is creating the loop. The server-wide default is 100.
When 0 is specified, no limit is applied. Only one MAXRECURSION value
can be specified per statement
If you wish to declare the maxrecursion parameter in the beginning of the query.
You could try building query something like:
DECLARE #Query NVARCHAR(MAX)
SET #Query = N'
;WITH foo AS (
...
)
SELECT * FROM foo
OPTION (MAXRECURSION ' + CAST(#maxrec AS NVARCHAR) + ');'
and the Execute it using Exec
You could go refer to this answer here:Maxrecursion parameter
I've seen other similar questions arround here, but they don't quite meet my needs, at least that's what I think.
I have a [reciepts] table with the following columns:
reciept_id,
customer_id,
ammount
...
Lets Say:
I have 5 unpayed reciepts from customer 1:
reciept_id: 1 | Ammount: 110€
reciept_id: 2 | Ammount: 110€
reciept_id: 3 | Ammount: 130€
reciept_id: 4 | Ammount: 110€
reciept_id: 5 | Ammount: 190€
So, customer 1, pays me 220€.
Now I need to select the oldest reciepts, until this 220€ sum is met, but only in a straight order, like (reciept 1 + reciept 2) and NOT like (reciept 1 + reciept 4).
Can you help me with the best query for this, or at least point me the best answer out there?
Thanks in advance :)
Assuming that the sum will match the rows in sequence. Following query will work.
DECLARE #Table TABLE(Reciept_Id INT , Amount INT)
INSERT INTO #Table
SELECT *
FROM (
VALUES (1, 110),(2,110),(3,130),(4,110),(5,190)
) t (Reciept_Id, Amount)
--Query
SELECT * FROM
(
SELECT
Reciept_Id,
Amount,
SUM(Amount) OVER(ORDER BY Reciept_Id ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Total
FROM #Table
) T
WHERE T.Total <= 220
Output:
Reciept_Id Amount Total
----------- ------- ----------
1 110 110
2 110 220
Note : Query will work in SQL-Server 2012 and higher versions.
Anyone know an efficient way to write a query to compare a current month's revenue to the average monthly revenue for the past 6 months?
Here's an example with just 2 columns, the actual month and the month's revenue
Columns:
MonthYear RevenueAmt
Jan2017 120
Dec2016 75
Nov2016 50
Oct2016 100
Sep2016 75
Aug2016 100
Jul2016 100
so....the average of the previous 6 months (Jul to Dec) is
(75 + 50 + 100 + 75 + 100 + 100) = 500
500 / 6 = 83.33
The current month (Jan2017) is 120,
so the difference becomes-
120 - 83.33 = 36.67
So, Jan2017 is 36.67 higher than the average of its past 6 months.
You can use the window functions and set the frame via ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
This is a rolling variance, and I did make one modification... I used an actual date so we can set the proper Order By in the Over clause
Edit: I added the Prior6MthAvg column to illustrate the math
Declare #YourTable table (MonthYear Date,RevenueAmt int)
Insert Into #YourTable values
('2017-01-01',120),
('2016-12-01',75),
('2016-11-01',50),
('2016-10-01',100),
('2016-09-01',75),
('2016-08-01',100),
('2016-07-01',100)
Select A.*
,Prior6MthAvg = avg(RevenueAmt+0.0) over (Order By MonthYear ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
,Variance = RevenueAmt-avg(RevenueAmt+0.0) over (Order By MonthYear ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
From #YourTable A
Order by MonthYear Desc
Returns
MonthYear RevenueAmt Prior6MthAvg Variance
2017-01-01 120 83.333333 36.666667
2016-12-01 75 85.000000 -10.000000
2016-11-01 50 93.750000 -43.750000
2016-10-01 100 91.666666 8.333334
2016-09-01 75 100.000000 -25.000000
2016-08-01 100 100.000000 0.000000
2016-07-01 100 NULL NULL
I was going to comment, but it is more of an answer....
Please note: efficiency, and planning thereof, requires a broader scope of design. That said, in general - assuming a larger amount of data - the most optimal way (in my experience) is to keep a separate table for the "running averages."
To explain, I would keep a separate table (single record, if only 1 set of data - i.e. not company based), keyed by current month, with the given average AND the 6th month total.
Once done - insert a trigger for adding a new month (again, in general, or by company, if divided) that will subtract the 6th month, and add the current.
Then, either the total, average or both are stored separately, and quickly accessed.
Once done - a simple join will bring the total/average into any given query.
Edited to add: (NOTE: my T-SQL may be rusty, but this should give you the idea)
NOTE: it assumes the base table of ClientData, with a Secondary Table SalesAvg, linked by id (of the rep) and the date (the month marker, if the date can vary, you would need to split out month/year for the key and link). Pulling it from the same table as given will basically task the server at the point of query. This method distributes the work to the point of insertion (normally more spread out) and as the average is keyed, allows for quickest retrieval using an inner join.
CREATE TRIGGER UpdateSalesAvg
ON schema.ClientData
AFTER INSERT
AS
DECLARE #ID as INT;
DECLARE #Date as DATE;
DECLARE #Count as INT;
SET #ID=new.id;
SET #Date=new.date;
SELECT #Count = ISNULL(RunCount, 0) FROM schema.SalesAvg
WHERE id = #ID AND monthdate = #Date;
IF (#Count = 0)
BEGIN
INSERT INTO schema.SalesAvg (id, monthdate, RunAvg, RunCount, LastPost)
VALUES (new.id, new.monthdate, new.value, 1, new.value);
END
ELSE IF (#Count = 6)
BEGIN
UPDATE schema.SalesAvg
SET RunAvg = ROUND((RunAvg * 6) - LastPost + new.value) / 6, 2),
LastPost = new.value
WHERE id = #ID
AND monthdate = #Date;
END
ELSE
BEGIN
UPDATE schema.SalesAvg
SET RunAvg = ROUND( (RunAvg * RunCount + new.value) / (RunCount + 1), 2 ),
RunCount = RunCount + 1,
LastPost = new.value
WHERE id = #ID
AND monthdate= #Date;
END
GO
I have a table in sql server which has data values and timestamp for every 15 min interval. from jan 2015 to june 2016. Every 96 interval set (15 min interval,so 96 in 24 hrs) has one data value which is highest of that day (24 hrs). I have another 3 columns called a,b,c. I need to find that max value of every 96 interval set and their respective A,B,C values. I tried to use max and group by but couldn't get the exact number. Can some body please help me on this.
I have posted the snap shot of how the data looks. so in this case, you are looking at Jan 02, I need the Maximum value of KW, for the whole 96 intervals.And more importantly, I need the Phase a, phase b,phase c values of the corresponding max value.
I am assuming that you want the maximum each day. You can use row_number() for this purpose:
select t.*
from (select t.*,
row_number() over (partition by cast(timestamp as date)
order by onedatavalue desc
) as seqnum
from t
) t
where seqnum = 1;
Assuming you wanting to get the max value each day, you can just use a Max within sub query like this:
select * from DATATABLE as D
where KW = (Select MAX(KW) from DATATABLE
where DAY(KEY_POINT_DTTM_15MIN) = DAY(D.KEY_POINT_DTTM_15MIN))
I guess you're using max() but you forgot the DAY() function..