Weighted average calculation in MySQL? - database

I am currently using the following query to get some numbers:
SELECT gid, count(gid), (SELECT cou FROM size WHERE gid = infor.gid)
FROM infor
WHERE id==4325
GROUP BY gid;
The output I am getting at my current stage is the following:
+----------+-----------------+---------------------------------------------------------------+
| gid | count(gid) | (SELECT gid FROM size WHERE gid=infor.gid) |
+----------+-----------------+---------------------------------------------------------------+
| 19 | 1 | 19 |
| 27 | 4 | 27 |
| 556 | 1 | 556 |
+----------+-----------------+---------------------------------------------------------------+
I am trying to calculate the weighted average i.e.
(1*19+4*27+1*556)/(19+27+556)
Is there a way to do this using a single query?

Use:
SELECT SUM(x.num * x.gid) / SUM(x.cou)
FROM (SELECT i.gid,
COUNT(i.gid) AS num,
s.cou
FROM infor i
LEFT JOIN SIZE s ON s.gid = i.gid
WHERE i.id = 4325
GROUP BY i.gid) x

You could place your original query as a sub-query and SUM the records. I could not test this as I don't have the dataset you do, but it should work in theory ;)
SELECT SUM(gid)/SUM(weights) AS calculated_average FROM (
SELECT gid, (COUNT(gid) * gid) AS weights
FROM infor
WHERE id = 4325
GROUP BY gid);

Related

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

Rank by top customers within each separate month -

I am having trouble ranking top customers by month. I created a new Rank column - but how do I break it up by month? Any help plz. Code and tables below:
The logic for ranking is selecting the top two customers per month from the tables. Also wrapped into the code (attempted at least) is renaming the date field and setting it to reflect end of month date only.
SELECT * FROM table1;
UPDATE table1
SET DATE=EOMONTH(DATE) AS MO_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER SALES;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
Starting wtih
------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired end result
+------+---------+-------+------+--+
| CUST | MO_END | SALES | RANK | |
+------+---------+-------+------+--+
| 37 | 3-31-18 | 100 | 1 | |
| 38 | 3-25-18 | 65 | 2 | |
| 39 | 4-30-18 | 500 | 1 | |
| 40 | 4-45-18 | 199 | 2 | |
+------+---------+-------+------+--+
As a simple selection:
select *
from (
select
table1.*
, DENSE_RANK() OVER(PARTITION BY cust, EOMONTH(DATE) ORDER BY sales DESC) as ranking
from table1
)
where ranking < 3
;
If storing is important: I would not use [rank] as a column name as I avoid any words that are used in SQL, maybe [sales_rank] or similar.
with cte as (
select
cust
, DENSE_RANK() OVER(PARTITION BY cust, EOMONTH(DATE) ORDER BY sales DESC) as ranking
from table1
)
update cte
set sales_rank = ranking
where ranking < 3
;
There is really no reason to store the end of month, just use that function within the partition of the over() clause.
LIMIT 2 is not something that can be used in SQL Server by the way, and it sure can't be used "per grouping". When you use a "window function" such as rank() or dense_rank() you can use the output of those in the where clause of the next "layer". i.e. use those functions in a subquery (or cte) and then use a where clause to filter rows by the calculated values.
Also note I used dense_rank() to guarantee that no rank numbers are skipped, so that the subsequent where clause will be effective.

Counts for specific properties

I have the following query which I know is incorrect syntax
SELECT Vender as Carrier,
count(IsPup WHERE IsPup = 1) as PU,
count(IsFull WHERE IsFull = 1) as FU,
count(*) as NUM, count(IsPup)/2 + Count(IsFull) as FTE
FROM Trailers WHERE Completed = 0 group by Vender order by NUM;
In particular the count(IsPup WHERE IsPup = 1) is wrong, I've searched various phrases like "How to count multiple properties of rows in SQL" etc.
and tried other manipulations of the same query like count(IsPup) as PU, count(IsFull) as FU
I had the syntactically correct query
SELECT Vender as Carrier,
count(IsPup) as PU,
count(IsFull) as FU,
count(*) as NUM,
count(IsPup)/2 + Count(IsFull) as FTE
FROM Trailers WHERE Completed = 0 group by Vender order by NUM
Which runs but PU, FU, and NUM are always being the same value...
I'm trying to get a table like below
| Carrier | PU | FU | NUM | FTE |
--------------------------------------------
| Vender1 | 2 | 1 | 3 | 2 |
| Vender2 | 0 | 4 | 4 | 4 |
| TOTAL | 2 | 5 | 7 | 6 |
The trailers table has IsPup and IsFull as the BIT type so they are true or false (0 or 1)
I thought this query would be simple and feel like I am missing something obvious
How do I get the counts of each separate property and the total count?
The duplicate question marked doesn't match the format with the total on the bottom.
SELECT
VENDER
, SUM(CAST(IsPUP AS INT)) AS PU
, SUM(CAST(IsFull AS INT)) AS FU
, COUNT(*) AS NUM
, SUM(CAST(IsPUP AS INT)) * .5 + SUM(CAST(IsFull AS INT))
FROM Trailers
WHERE COMPLETED = 0
GROUP BY VENDER
WITH ROLLUP

Calculating sum of differences from each group

I have the following table:
Sensor | building | Date_time | Current_value
1 | 1 | 20.08.2017 | 20
1 | 1 | 21.08.2017 | 25
1 | 1 | 22.08.2017 | 35
2 | 1 | 20.08.2017 | 120
2 | 1 | 21.08.2017 | 200
2 | 1 | 22.08.2017 | 210
3 | 2 | 20.08.2017 | 20
3 | 2 | 21.08.2017 | 25
3 | 2 | 22.08.2017 | 85
5 | 2 | 20.08.2017 | 320
5 | 2 | 21.08.2017 | 400
5 | 2 | 22.08.2017 | 410
The sensor ID is assumed to be unique, as is the building ID.
I need to calculate the total value for each building for any given timeframe by subtracting the MIN value from the MAX value for each sensor, then group the sum by each building.
In the above sample it would be
Sensor 1: (35 - 20)=15
Sensor 2: (210-120)=90
Building 1 = 15+90 = 105
(...)
Building 2 = 65+90 = 155
Any pointers in the right direction are greatly appreciated!
You are asking how to calculate the difference between min and max values per sensor, then aggregate the differences per building.
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
If you want to restrict the time period, you'll have to do so inside the CTE :
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
WHERE Date_Time between #start and #end
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
You can convert this query into a user defined function that can be used in other queries :
create function fn_TotalDiffs(#start datetime2(0), #end datetime2(0))
returns table
as
Return (
with diffs as (
select Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
from SomeTable
Group by Building, Sensor
)
select Building,sum(diff) as Total
from diffs
Group by Building
)
Another option using window function min/max over()
Example
Select Building
,Total = sum(R1)
From (
Select Distinct
Building
,R1 = max([Current_value]) over (Partition By Building,Sensor)
-min([Current_value]) over (Partition By Building,Sensor)
From YourTable
Where Date_time between #Date1 and #Date2
) A
Group By Building
Returns
Building Total
1 105
2 155

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

Resources