Calculating mean from column values - sql-server

I want a result-set as below from a table:
I tried query:
select sdate, sum(PG)/sum(PT)*100 AS Score, avg(score) as Mean from table
but I am not getting the correct Mean.
Mean is: sum of all scores / total number of scores.
I want to show mean as computed column. In the above result-set, total of scores is 309 and when divided by 4 (total number of rows) it gives 77.25.
I want to display the result as shown in the result-set.

I believe you're looking for something like:
SELECT sdate,
SUM(PG)/SUM(PT)*100 AS Score,
(SELECT AVG(score) FROM table) AS Mean
FROM table
This should set the mean to the average score across the entire table. If you have WHERE clauses for filtering, you would have to place them in both the subquery and the main query.
EDIT
If the original SQL statement has a GROUP BY, as it sounds like it does, then you could use the following query to achieve what you're looking for:
SELECT sdate,
SUM(PG)/SUM(PT)*100 AS score,
(SELECT AVG(score)
FROM (SELECT CAST(SUM(PG)/SUM(PT)*100 AS FLOAT) AS score
FROM table
GROUP BY sdate) scores) AS Mean
FROM table
GROUP BY sdate
It's not pretty, but I believe it'll accomplish what you're looking for.

DECLARE #mean DECIMAL(5,2);
SELECT #mean = AVG(score) FROM dbo.table;
SELECT sdate, score, Mean = #mean FROM dbo.table;

I think this will work for you
as you need to show avg in each row , i am using subquery to generated that avg each time .
create table mean
(
d date,
score int
)
insert into mean values ('01/01/2013',50),('02/01/2013',60)
,('03/01/2013',40),('04/01/2013',30)
,('05/01/2013',20),('06/01/2013',20)
SELECT d,score,(select sum(score)/count(*) from mean) from mean
SQL FIDDLE LINK
but it will be good if you count avg first and use that variable in your select statement
SELECT d,score,
ISNULL((select CONVERT(DECIMAL(10,2),CONVERT(DECIMAL(10,2),sum(score))/count(*) ) from #mean) ,0) as mean from #mean

Related

T-SQL: aggregate function for calculating Nth percentile

I am trying to calculate the Nth percentile of all of the values in a single column in a table. All I want is a scalar, aggregate value for which N percent of the values are below. For instance, If the table has 100 rows where the value is the same as the row index plus one (1 to 100 consecutively), then I'd want this value to tell me that 95% of the values are below 95.
The PERCENTILE_CONT analytic function looks closest to what I want. But if I try to use it like this:
SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER () AS P95
I get one row per row in the table, all with the same value. I could use TOP 1 to just give me one of those rows, but now I've done an additional table scan.
I am not trying to create a wizbang table of results partitioned by some other column in the original table. I just want an aggregate, scalar value.
Edit: I have been able to use PERCENTILE_CONT in a query with a WHERE clause. For example:
DECLARE #P95 INT
SELECT TOP 1 #P95 = (PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER ())
FROM ExampleTable
WHERE LOWER(Color) = 'blue'
SELECT #P95
Including the WHERE clause gives a different result than I got without it.
From what I can tell, you will need to do a subquery here. For example, to find the number of records strictly below the 95 percentile we can try:
WITH cte AS (
SELECT ValueColumn,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER () AS P95
FROM yourTable
)
SELECT COUNT(*)
FROM cte
WHERE ValueColumn < P95;

Talend: Get most common value in a column

I have a table with a couple hundred rows. I want to know the most common value of the data in one of the columns. How do I go about that?
I recommend you do it in your sql query with something like this :
select top 1 column, count(*) cnt
from table
group by column
order by count(*) desc
This syntax has to be adapted to your rdbms. For instance, in Oracle it would be something like this :
select column from (
select column, count(*)
from table
group by column
order by count(*) desc
) where rownum = 1
If you want to do it in Talend you can use :
Input -- tAggregateRow -- tSortRow -- tSampleRow -- Output
In tAggregateRow you use a count function to count the frequency of values in your column, then you sort them by descending order in tSortRow, then you get the first line with tSampleRow (just put "1")

Using a running total calculated column in SQL Server table variable

I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?
GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it
Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc

Create conditional for COUNT

Forgive me if this has been asked but I have been unable to find a solution for this. I'm using the COUNT function, and would like for all returned values greater than 5 to return as ">5".
It seems that the HAVING clause is the way to go, but I don't know how to make it conditional.
Thanks in advance.
SELECT DISTINCT
Column A
COUNT(*) AS VISITS
FROM TABLE A
GROUP BY
Column A
ORDER BY
Column A
you can use case statement to return the correct label,
SELECT ColumnaA, case when count(*) > 5 then '>5' else cast(count(*) as varchar(4)) end as visits
from tableA
group by ColumnA
order by ColumnA

SQL select after where clause

Here is the setup:
Table 1: table_1
column_id
column_12
column_13
column_14
Table 2: table_2
column_id
column_21
column_22
Select statement:
DECLARE #Variable
INT SET #Variable = 300
SELECT b.column_id,
b.column_12,
SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12) AS sum_column_13,
#Variable / nullif(SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12),0) AS divide_var,
(b.column_13*100) / nullif(b.column_14,0) AS divide_column_3
FROM dbo.table_1 b
WHERE b.column_12 IN ('AM','AJ','A-M','A-J','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q');
This works great, all the formulas are working and the correct results are shown.
b.column_id is retrieved
b.column_12 is retrieved
sum_column_13 is equal to the sum of all the column_13 values (partitioned by column_id)
divide_var is equal to a variable dived by sum_column_13
divide_column_13 is equal to column_13 divided by column_14
Now however I am trying to retrieve the #Variable from table_2, instead of it being static.
Both tables have a column_id, which could link them together. However this value is not unique.
The actual number for #Variable should come from table_2; by summing all the values of column_21 for each column_id.(Something similar sum_column_13)
I can make both things work separately, but when I try to combine them (with a JOIN, or an extra SELECT class) everything goes wild. For example when using the JOIN statement, the WHERE class is solely applied to the JOIN statement and not to the SELECT statement. How I imagine it should go is to use the column_id results from the current SELECT, then use this to retrieve the required data from table_2.
I understand my explanation is not very clear. So here is an SQLFiddle.
As you can see the variable right now comes from adding up the two values in table_2.
Hope this helps.
Thanks,
Here is the sample code, I've not made use of variable instead I'm using the sum of columns directly, also I've made use of CTE:
with tbl_2(col_id, col_sum) as
( select col_id, sum(column_21) col_sum from tbl_2 group by col_id)
SELECT b.column_id,
b.column_12,
SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12) AS sum_column_13,
col_sum / nullif(SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12),0) AS divide_var,
(b.column_13*100) / nullif(b.column_14,0) AS divide_column_3
FROM dbo.table_1 b
join tbl_2 on b.col_id=tbl_2.col_id
WHERE b.column_12 IN ('AM','AJ','A-M','A-J','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q');

Resources