How can I pivot SUM(Premium) by each quarter between two dates? - sql-server

As you can see on a picture below (Excel), I have two dates: TransEffDate and TransExpDate. How can I break the premium of $490 and put it in a quarter buckets?
How can I achieve the same in SQL?
I have this:
SELECT PolicyNumber,
TransactionEffectiveDate,
TransactionExpirationDate,
Coverage,
WrittenPremium,
CAST(YEAR(TransactionEffectiveDate) as varchar(5))+'.'+ CAST(DATEPART(QUARTER,TransactionEffectiveDate) as varchar(1)) as YearQuarter
FROM PlazaInsuranceWPDataSet
WHERE PolicyNumber ='PACA1000101-00'
ORDER BY PolicyNumber
For 1st quarter will be 0, because TransEffDate starts in a second quarter.
For 2nd quarter we need to find the number of days between TransEffDaya and TransExpDate which is 365 days , then divide Premium($490) by 365 days which is $1.34 per day. Then 1.34 multiply by number of days between TransEffDate and end of second quarter (which is 65 days).
so something like that:
WrittenPremium/DATEDIFF(DAY,TransactionEffectiveDate,TransactionExpirationDate) * DATEDIFF(DAY,TransactionEffectiveDate, EndOfQuarter) END AS Year_Quarter_1
But how can I get EndOfQuarter dynamically for each PolicyNumber
There are should be some formulas for this purpose.
Thanks

Consider the following dynamic pivot.
Now, I cheated a bit by dropping the intermediate results in a temp table, but this can be changed if necessary...
By using an ad-hoc tally table in CROSS APPLY the dates and values are allocated correctly via a day-weighted methodology. In other words, the math works.
--Drop Table #TempData
Select A.[PolicyNumber]
,A.[Coverage]
,A.[Premium]
,A.[TransEff]
,A.[TransExp]
,B.*
Into #TempData
From YourTable A
Cross Apply (
Select Qtr = Format(max(DatePart(YY,D)+DatePart(QQ,D)/10.0),'0000.0')
,Value = (A.Premium/(DateDiff(DD,A.TransEff,A.TransExp)+1.0))*count(*)
From (Select Top (DateDiff(DD,A.TransEff,A.TransExp)+1) D=DateAdd(DD,Row_Number() Over (Order By (Select null))-1,A.TransEff) From master..spt_values ) D
Group By DatePart(YY,D),DatePart(QQ,D)
) B
Where PolicyNumber ='PACA1000101-00'
Declare #SQL varchar(max) = Stuff((Select Distinct ',' + QuoteName(Qtr) From #TempData Order by 1 For XML Path('') ),1,1,'')
Select #SQL = '
Select [PolicyNumber],[Coverage],[Premium],[TransEff],[TransExp],' + #SQL + '
From #TempData
Pivot (Sum([Value]) For [Qtr] in (' + #SQL + ') ) p
Order By 1,3'
Exec(#SQL);
Returns
If it helps witht he visualization, the temp table looks like the image below. Then it be comes a simple PIVOT
EDIT - To Fix the Order By QTR - Notice the Order By 1
Declare #SQL varchar(max) = Stuff((Select Distinct ',' + QuoteName(Qtr) From #TempData Order by 1 For XML Path('') ),1,1,'')

Boy that's tough. Here's one way, you create a table with the quarter boundaries in it. You can add dates way into the future.
CREATE TABLE quarters(
lo DATETIME NOT NULL PRIMARY KEY,
hi DATETIME NOT NULL
);
INSERT INTO quarters VALUES ('2012-01-01','2012-04-01');
INSERT INTO quarters VALUES ('2012-04-01','2012-07-01');
INSERT INTO quarters VALUES ('2012-07-01','2012-10-01');
INSERT INTO quarters VALUES ('2012-10-01','2013-01-01');
INSERT INTO quarters VALUES ('2013-01-01','2013-04-01');
INSERT INTO quarters VALUES ('2013-04-01','2013-07-01');
INSERT INTO quarters VALUES ('2013-07-01','2013-10-01');
INSERT INTO quarters VALUES ('2013-10-01','2014-01-01');
Here's one line of policy data
CREATE TABLE Insurance (
policynumber VARCHAR(10) NOT NULL PRIMARY KEY,
premium INT,
TransEff datetime,
TransExp datetime
);
INSERT INTO Insurance VALUES ('PACA1',490,'2012-04-27','2013-04-27');
You can join this with your data table - the join condition is that the periods overlap:
SELECT datepart(YEAR,l1) y,datepart(quarter,l1) q,l1,h1,
CASE WHEN l1>l2 THEN l1 ELSE l2 END AS maxst,
CASE WHEN h1>h2 THEN h2 ELSE h1 END AS minend
FROM
(SELECT policynumber,TransEff,
CAST(lo AS INT) l1,CAST(transeff AS INT) l2,
CAST(hi AS INT) h1,CAST(transexp AS INT) h2
FROM Insurance JOIN quarters ON(hi>transeff AND lo<transexp)
) AS i;
That gives the overlapping dates:
y q l1 h1 maxst minend
2012 2 40998 41088 41024 41088
2012 3 41089 41180 41089 41180
2012 4 41181 41272 41181 41272
2013 1 41273 41362 41273 41362
2013 2 41363 41453 41363 41389
You can now do the subtraction to find how many days apply to each quarter.
SELECT policynumber pn, y, q, minend-maxstart v
FROM(
SELECT policynumber, datepart(YEAR,l1) y,datepart(quarter,l1) q,
CASE WHEN l1>l2 THEN l1 ELSE l2 END AS maxstart,
CASE WHEN h1>h2 THEN h2 ELSE h1 END AS minend
FROM
(SELECT policynumber,TransEff,
CAST(lo AS INT) l1,CAST(transeff AS INT)l2,
CAST(hi AS INT) h1,CAST(transexp AS INT)h2
FROM Insurance JOIN quarters ON(hi>transeff AND lo<transexp)
) AS i
) as x
Which gives...
pn y q v
PACA1 2012 2 65
PACA1 2012 3 92
PACA1 2012 4 92
PACA1 2013 1 90
PACA1 2013 2 26

Related

SQL CPU Script - count consecutive occurrences of value

so I'm working on a SQL CPU utilization script that gets the last (for ex, 10 mins of CPU usage), for a SQL instance as available from sys.dm_os_ring_buffers - pretty standard script.
however, what I want to do, is grab this info, but count the consecutive occurrences in the sample (ie 10 mins), so if for 10 mins (10 consecutive records where value > 90%) do X
here's the code i'm using: (EDITED FOR CORRECT CODE)
DECLARE #ts BIGINT;
DECLARE #lastNmin TINYINT;
SET #lastNmin = 10;
SELECT #ts =(SELECT cpu_ticks/(cpu_ticks/ms_ticks) FROM
sys.dm_os_sys_info);
SELECT TOP(#lastNmin)
SQLProcessUtilization AS [SQLServer_CPU_Utilization],
SystemIdle AS [System_Idle_Process],
100 - SystemIdle - SQLProcessUtilization AS
[Other_Process_CPU_Utilization],
DATEADD(ms,-1 *(#ts - [timestamp]),GETDATE())AS [Event_Time]
FROM (SELECT record.value('(./Record/#id)[1]','int')AS record_id,
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)
[1]','int')AS [SystemIdle],record.value
('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)
[1]','int')AS [SQLProcessUtilization],
[timestamp]
FROM (SELECT[timestamp], convert(xml, record) AS [record]
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type =N'RING_BUFFER_SCHEDULER_MONITOR'AND record
LIKE'%%')AS x )AS y
ORDER BY record_id DESC;
Thanks
It sounds like you want a gaps and islands approach. Here's what I came up with:
DROP TABLE IF EXISTS #tmp;
DECLARE #ts BIGINT;
DECLARE #lastNmin TINYINT;
SET #lastNmin = 10;
SELECT #ts =
(
SELECT cpu_ticks / (cpu_ticks / ms_ticks) FROM sys.dm_os_sys_info
);
SELECT TOP (#lastNmin)
SQLProcessUtilization AS [SQLServer_CPU_Utilization],
SystemIdle AS [System_Idle_Process],
100 - SystemIdle - SQLProcessUtilization AS [Other_Process_CPU_Utilization],
DATEADD(ms, -1 * (#ts - [timestamp]), GETDATE()) AS [Event_Time]
INTO #tmp
FROM
(
SELECT record.value('(./Record/#id)[1]', 'int') AS record_id,
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle],
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [SQLProcessUtilization],
[timestamp]
FROM
(
SELECT [timestamp],
CONVERT(XML, record) AS [record]
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
AND record LIKE '%%'
) AS x
) AS y
ORDER BY record_id DESC;
WITH cte AS (
SELECT *, CAST(CASE WHEN [System_Idle_Process] >= 95 THEN 1 ELSE 0 END AS BIT) as [HighCPU]
FROM #tmp
),
GapsAndIslands AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY cte.Event_Time) AS rn1,
ROW_NUMBER() OVER (PARTITION BY cte.HighCPU ORDER BY cte.Event_Time) AS rn2
FROM cte
)
SELECT *, rn1 - rn2 AS GroupID
FROM GapsAndIslands
ORDER BY GapsAndIslands.Event_Time;
By way of explanation, I'm creating three synthetic columns
a boolean representing the condition you're looking to track (NB - I'm using a different metric than you should because my CPU usage is low!)
a row number column across the entire data set
a row number column for each distinct value of the tracked metric
What makes this solution work is noting that the difference in those two row number columns will be the same for consecutive rows that have the same value for your tracked metric and will change on the boundaries. I've left that as GroupID in the final result set and you can use that to track groups of consecutive rows.
If you instead replace that last select with this:
SELECT MIN(Event_Time), MAX(Event_Time)
FROM GapsAndIslands
WHERE [HighCPU] = 1
GROUP BY rn1 - rn2
ORDER BY MIN(Event_Time);
That will give you the time ranges for when the tracked metric was above threshold.

TSQL While Loop over months in year

I have an output that I need to achieve and I am not too certain how to go about it.
I first need to start by looping over each month in the year and using that month in a select statement to check for data.
For example:
Select * from table where MONTH(A.[submissionDate]) = 1
Select * from table where MONTH(A.[submissionDate]) = 2
Select * from table where MONTH(A.[submissionDate]) = 3
My end result is to create this XML output to use with a chart plugin. It needs to include the months even if there is no data which is why I wanted to loop through each month to check for it.
<root>
<dataSet>
<areaDesc>Area 1</areaDesc>
<data>
<month>January</month>
<monthValue>1</monthValue>
<submissions>0</submissions>
</data>
<data>
<month>February</month>
<monthValue>2</monthValue>
<submissions>7</submissions>
</data>
<data>
<month>March</month>
<monthValue>3</monthValue>
<submissions>5</submissions>
</data>
</dataSet>
<dataSet>
<areaDesc>Area 2</areaDesc>
<data>
<month>January</month>
<monthValue>1</monthValue>
<submissions>0</submissions>
</data>
<data>
<month>February</month>
<monthValue>2</monthValue>
<submissions>7</submissions>
</data>
<data>
<month>March</month>
<monthValue>3</monthValue>
<submissions>5</submissions>
</data>
</dataSet>
</root>
I may be way over thinking this but I'm hoping I talking it through may help me out a little.
Here is my current set up of how I get some other stats:
--Temp table
DECLARE #areas TABLE (
area VARCHAR (100));
IF #dept = 'global'
OR #dept = ''
BEGIN
INSERT INTO #areas (area)
SELECT DISTINCT(AreaDesc)
FROM dbo.EmpTable;
END
ELSE
BEGIN
INSERT INTO #areas
SELECT #dept;
END
IF (#action = 'compare')
BEGIN
SELECT DATENAME(month, A.[submissionDate]) AS [month],
MONTH(A.[submissionDate]) AS [monthValue],
count(A.[submissionID]) AS submissions,
B.[AreaDesc]
FROM empowermentSubmissions AS A
INNER JOIN empTable AS B
ON A.[nomineeQID] = B.[QID]
WHERE YEAR(A.[submissionDate]) = #year
AND A.[statusID] = 3
AND A.[locationID] IN (SELECT location
FROM #table)
GROUP BY DATENAME(month, A.[submissionDate]), MONTH(A.[submissionDate]), B.[AreaDesc]
ORDER BY [monthValue] ASC
FOR XML PATH ('dataSet'), TYPE, ELEMENTS, ROOT ('root');
END
ELSE
This is a great application for a "Dates" table or view. Create a new table in your database with schema like:
CREATE TABLE dbo.Dates (
Month INT,
MonthName VARCHAR(20)
)
Populate this table with the years and months you may want to aggregate over. Then, you can make your query like:
SELECT
Area
Dates.MonthName,
COUNT(*) AS Count
FROM
dbo.Dates
LEFT OUTER JOIN
dbo.Submissions
AND Dates.Month = MONTH(Submissions.SubmissionDate)
GROUP BY
Dates.MonthName,
Area
The LEFT OUTER JOIN will give you one row for every Year and Month in the dates table, and a count of any submissions on that month. You end up with output like:
Area | MonthName | Count
Area 1 | Jan | 0
Area 2 | Feb | 2
&c.
You'll want to do a FOR XML structure to get the exact result set you're looking for in one go, I think. I put this together with what I could glean about your XML. Just change the name of the table variable here to your real table name and this should work.
EDIT: changed up the query to match the definition from the posted query. Updated the data element where clause to maintain month instantiation when zero counts were found in a month.
EDIT: Added Status requirement.
EDIT: Moved areaDesc criteria for constant month output.
declare #empowermentSubmissions table (submissionID int primary key identity(1,1), submissionDate datetime, nomineeQID INT, statusID INT)
declare #empTable table (QID int primary key identity(1,1), AreaDesc varchar(10))
declare #n int = 1
while #n < 50
begin
insert into #empTable (AreaDesc) values ('Area ' + cast((#n % 2)+1 as varchar(1)))
set #n = #n + 1
end
set #n = 1
while #n < 500
begin
insert into #empowermentSubmissions (submissionDate, nomineeQID, StatusID) values (dateadd(dd,-(cast(rand()*600 as int)),getdate()), (select top 1 QID from #empTable order by newid()), 3 + (#n % 2) - (#n % 3) )
set #n = #n + 1
end
declare #year int = 2014
select (
select (
select (
select e1.areaDesc
from #empTable e1
where e1.areaDesc = e2.areaDesc
group by e1.areaDesc
for xml path(''),type
)
, (
select [month], [monthValue], count(s1.submissionID) as submissions
from (
select #year [Year]
, datename(month,dateadd(mm,RowID-1,#year-1900)) [Month]
, month(dateadd(mm,RowID-1,#year-1900)) [MonthValue]
from (
select *, row_number()over(order by name) as RowID
from master..spt_values
) d
where d.RowID <= 12
) t
left join (
select s3.submissionID, s3.submissionDate, e3.AreaDesc
from #empowermentSubmissions s3
inner join #empTable e3 on s3.nomineeQID = e3.QID
where s3.statusID = 3
and e3.areaDesc = e2.areaDesc
) s1 on year(s1.submissionDate) = t.[Year]
and month(s1.submissionDate) = t.[MonthValue]
group by [Month], [MonthValue]
order by [MonthValue]
for xml path('data'),type
)
for xml path(''),type
) dataset
from #empowermentSubmissions s2
inner join #empTable e2 on s2.nomineeQID = e2.QID
group by e2.areaDesc
for xml path(''), type
) root
for xml path (''), type
You should be able to use a tally table to get the months:
SELECT TOP 12 IDENTITY(INT,1,1) AS N
INTO #tally
FROM master.dbo.syscolumns sc1
SELECT DATENAME(MONTH,DATEADD(MONTH,t.N-1,'2014-01-01')) AS namemonth, t.N AS monthvalue, COUNT(tbl.submissionDate) AS submissions, tbl.Area
FROM #tally t
LEFT OUTER JOIN tbl ON MONTH(tbl.submissionDate) = t.N
GROUP BY t.n, tbl.Area
DROP TABLE #tally

Avoid WHILE loop and CURSOR for better performance?

I'm wondering if someone can help simplify this procedure - and improve performance...!?
We have data on grants. 'Donors' give funds to 'Recipients' and we want to show the top 15 recipients for each donor over 3 periods: CurrentYear-20, CurrentYear-10 and CurrentYear. We publish an annual report and show percentage shares of World and GeoZone totals for each donor.
I have "inherited" this code which was written by one of my predecessors. Until we switched to using a view, execution time was around 15-30 mins. Currently, this runs in just under FOUR hours (scheduled as a Server Agent job)! Management are not happy. For various reasons, the view must continue to be used and currently has just under 900,000 rows with data from the 1950s onwards. We current run this report for 30 (large) donors and more are added each year.
To help improve performance, I have thought about using a CTE or/using SUM() OVER(Partition BY...) or combination of these, but I'm not sure how to go about it.
Could someone point me in the right direction?
Here is the process:
create a table (variable) to hold the top 15 recipients for the current donor
create a table (variable) to hold the list of donors
populate the donor table with the donors in the order they appear in the report
loop thru the donor table and for each donor:
put the donor ID for this donor into a temp table
loop 3 times (for CurrentYear-20, CurrentYear-10, CurrentYear)
calculate the share totals for each of 18 regions/zones
print the values for each section in the report
get the next donor ID
As you may see from the above, the calculations are run 54 times (18x3) for each donor!
Here is the code (simplified):
-- #LatestYear is passed as a parameter, hardcoded here for simplicity
DECLARE #LatestYear SMALLINT ,
#CurrentYear SMALLINT ,
#DonorID SMALLINT ,
#totalWorld NUMERIC(10, 2) ,
#LoopCounter TINYINT ,
#DonorName VARCHAR(100)
SELECT #latestyear = 2012
-- create a table to hold list of top 15 recipients for each donor and their 'share' of ODA.
DECLARE #Top15 TABLE
(
Country VARCHAR(100) ,
Percentage REAL
)
-- create a table to hold list of donors, ordered as they need to appear in the report.
DECLARE #PageOrder TABLE
(
DonorID SMALLINT ,
DonorName VARCHAR(100) ,
SortOrder SMALLINT IDENTITY(1, 1)
)
-- create a table to store the "focus" donor.
DECLARE #CurrentDonor TABLE ( DonorID SMALLINT )
INSERT INTO #PageOrder
SELECT DonorID ,
DonorName
FROM dbo.LookupDonor
ORDER BY DonorName;
-- cursor to loop through the donors in SortOrder
DECLARE DonorCursor CURSOR
FOR
SELECT DonorID ,
DonorName
FROM #PageOrder
ORDER BY DonorName;
OPEN DonorCursor
FETCH NEXT FROM DonorCursor INTO #DonorID, #DonorName
WHILE ##fetch_status = 0
BEGIN
INSERT INTO pubOutput
( XMLText )
SELECT #DonorName;
-- Populate the DonorID table
INSERT INTO #CurrentDonor
VALUES ( #DonorID )
/* The following loop is invoked 3 times. The first time through, the year will be 20 years before the latest year,
the second time through, 10 years before. The last time through the year will be the latest year.
*/
SET #LoopCounter = 1
WHILE #LoopCounter <= 3
BEGIN
SELECT #CurrentYear = CASE #LoopCounter
WHEN 1 THEN #LatestYear - 20
WHEN 2 THEN #LatestYear - 10
ELSE #LatestYear
END
-- calculate the world total for the current years (year,year-1) for all recipients
SELECT #totalWorld = SUM(Amount)
FROM dbo.vData2 d
INNER JOIN ( SELECT RecipientID
FROM dbo.RecipientGroup
WHERE GroupID = 160
) c ON d.RecipientID = c.RecipientID
INNER JOIN #CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( #CurrentYear - 1, #CurrentYear )
-- calculate the GeoZones total for the current years (year,year-1)
SELECT #totalGeoZones = SUM(Amount)
FROM dbo.vDac2a d
INNER JOIN ( SELECT RecipientID
FROM dbo.GeoZones
WHERE GeoZoneID = 100
) x ON d.RecipientID = x.RecipientID
INNER JOIN #CurrentDonor z ON d.DonorCode = z.DonorCode
WHERE d.year IN ( #CurrentYear - 1, #CurrentYear )
-- Find the top 15 recipients for the current donor
INSERT INTO #Top15
SELECT TOP 15
r.RecipientName ,
( ISNULL(SUM(Amount), 0) / #totalWorld ) * 100
FROM dbo.vData2 d
INNER JOIN dbo.LookupRecipient r ON r.RecipientID = d.RecipientID
INNER JOIN #CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( #CurrentYear - 1, #CurrentYear )
GROUP BY r.RecipientName
ORDER BY 2 DESC
-- Print the top 15 recipients and total
INSERT INTO pubOutput
(
XMLText
)
SELECT country + #Separator + CAST(percentage AS VARCHAR)
FROM #Top15
ORDER BY percentage DESC
INSERT INTO pubOutput
(
XMLText
)
SELECT #Heading1 + #Separator + CAST(SUM(Percentage) AS VARCHAR)
FROM #Top15
-- Breakdown by Regionas
-- Region1
IF #totalWorld IS NOT NULL
INSERT INTO pubOutput
(
XMLText
)
SELECT 'Region1' + #Separator
+ CAST(( ISNULL(SUM(Amount), 0) / #totalWorld ) * 100 AS VARCHAR)
FROM dbo.vData2 d
INNER JOIN ( SELECT RecipientID
FROM dbo.RecipientGroup
WHERE RegionID = 1
) c ON d.RecipientID = c.RecipientID
INNER JOIN #CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( #CurrentYear - 1, #CurrentYear )
ELSE -- force output of sub-total heading
INSERT INTO pubOutput
(
XMLText
)
SELECT #Heading2 + #Separator + '--'
-- Region2-8
/* similar syntax as Region1 above, for all Regions 2-8 */
-- Total Regions
INSERT INTO pubOutput
(
XMLText
)
SELECT #Heading2 + #Separator + CAST(#totalWorld AS VARCHAR)
-- Breakdown by GeoZones 1-7
-- GeoZone1
INSERT INTO pubOutput
(
XMLText
)
SELECT 'GeoZone1' + #Separator
+ CAST(( ISNULL(SUM(Amount), 0) / #totalGeoZones ) * 100 AS VARCHAR)
FROM dbo.vDac2a d
INNER JOIN ( SELECT RecipientID
FROM dbo.GeoZones
WHERE GeoZoneID = 1
) m ON d.RecipientID = m.RecipientID
INNER JOIN #CurrentDonor z ON d.DonorCode = z.DonorCode
WHERE d.year IN ( #CurrentYear - 1, #CurrentYear )
-- GeoZones2-8
/* similar syntax as GeoZone1 above for GeoZones 2-7 */
-- Total GeoZones - currently hard-coded as 100, due to minor rounding errors
INSERT INTO pubOutput
(
XMLText
)
SELECT #Heading3 + #Separator + '100'
SET #LoopCounter = #LoopCounter + 1
END -- year loop
-- Get the next donor from the cursor
FETCH NEXT FROM DonorCursor
INTO #DonorID, #DonorName
END
-- donorcursor
-- Cleanup
CLOSE DonorCursor
DEALLOCATE DonorCursor
Many thanks in advance for any help you may be able to provide.
Avoiding cursor is must. You can use 'while' instead of cursor. However considering the complexity of query, keep cursor at this moment.
To improve performance in other way, check the number of records for below queries:
SELECT RecipientCode FROM dbo.RecipientGroup WHERE GroupID=160
SELECT RecipientCode FROM dbo.GeoZones WHERE GeoZoneID=100
SELECT RecipientID FROM dbo.RecipientGroup WHERE RegionID=1
I suggest create 3 temp tables for above query "outside" of cursor and use them inside of cursor.
Hope this helps!

How can I maintain a running total in a SQL Server database using VB.NET?

I am using Visaul Studio 2010 to build a Windows Forms application to maintain a table in an SQL Server 2008 database. The table is named CASHBOOK and here are the further details:
DATE | DESCRIPTION | DEBIT | CREDIT | BALANCE
--------|----------------|---------|-----------|---------
1/1/2011| CASH BALANCE | | | 5000
1/1/2011| SALES | 2500 | | 7500
2/1/2011| PURCHASE | | 3000 | 4500
2/1/2011| RENT | | 4000 | 500
2/1/2011| SALES | 5000 | | 5500
I can use CASHBOOKTABLEADAPTER.INSERT(...) to insert appropriately, but my problem is how do I update the BALANCE column?
See this article by Alexander Kuznetsov
Denormalizing to enforce business rules: Running Totals
You can try an insert with a subquery, something like following:
INSERT INTO CASHBOOK ( DESCRIPTION, DEBIT, BALANCE )
'asdf', 2500, SELECT TOP(1) BALANCE FROM CASHBOOK + 2500
It's a bit heavy handed, but here's a way to update the full table with balance information.
update
a
set
a.Balance = (
select sum(isnull(x.debit, 0.0) - isnull(x.credit, 0.0))
from cashbook x
where x.Date < a.Date
or (x.Date = a.Date and x.ID <= a.ID)
) + (
select top 1 y.Balance
from cashbook y
where y.debit is null
and y.credit is null
order by y.ID
)
from
cashbook a
Now that's useful only if you HAVE to have the balance in the table. A more appropriate solution might be to create a UDF that encompasses this logic and call that to calculate the balance field for a specific row only when you need it. It really all depends on your usage.
create function dbo.GetBalance(#id int) returns decimal(12, 2) as
begin
declare #result decimal(12, 2) = 0.0
select
#result = (
select sum(isnull(x.debit, 0.0) - isnull(x.credit, 0.0))
from cashbook x
where x.Date < a.Date
or (x.Date = a.Date and x.ID <= a.ID)
) + (
select top 1 y.Balance
from cashbook y
where y.debit is null
and y.credit is null
order by y.ID
)
from
cashback a
where
a.ID = #id
return #result
end
Why do you need to? This is something that should be calculated as a reporting / viewing function. I would suggest either creating a view with a running total column (various ways to achieve this).
Alternatively if you're viewing this in VB.Net calculate it in your app.
I agree with Joel, you should be calculating this at runtime, not storing the running totals in the database. Here's an example of how to figure out the running totals using a recursive cte in sql server:
declare #values table (ID int identity(1,1), Value decimal(4,2))
declare #i int
insert into #values values (1.00)
insert into #values values (2.00)
insert into #values values (3.00)
insert into #values values (4.00)
insert into #values values (5.00)
insert into #values values (6.00)
select #i=min(ID) from #values
;with a as
(
select ID, Value, Value as RunningTotal
from #values
where ID=#i
union all
select b.ID, b.Value, cast(b.Value + a.RunningTotal as decimal(4,2)) as RunningTotal
from #values b
inner join a
on b.ID=a.ID+1
)
select * from a
here's a blog on recursive queries: Recursive CTEs
Also here's a lengthy discusson about running totals.
One potential problem with recursive CTEs is the maximum depth limit of 32767, which can be prohibitive in a production environment.
In this solution you add an id column that is ordinal to the transaction sequence and then update the balance column in place.
declare #t table(id int identity(1,1) not null
, [DATE] date not null
, [DESCRIPTION] varchar(80) null
, [DEBIT] money not null default(0)
, [CREDIT] money not null default(0)
, [BALANCE] money not null default(0)
);
declare #bal money=0;
insert into #t([DATE],[DESCRIPTION],[DEBIT],[CREDIT],[BALANCE])
select '1/1/2011','CASH BALANCE',0,0,5000 UNION ALL
select '1/1/2011','SALES',2500,0,0 UNION ALL
select '2/1/2011','PURCHASE',0,3000,0 UNION ALL
select '2/1/2011','RENT',0,4000,0 UNION ALL
select '2/1/2011','SALES',5000,0,0;
set #bal=(select top 1 [BALANCE] from #t order by id); /* opening balance is stored but not computed, so we simply look it up here. */
update t
set #bal=t.[BALANCE]=(t.[DEBIT]-t.[CREDIT])+#bal
output
inserted.*
from #t t
left join #t t0 on t0.id+1=t.id; /*should order by id by default, but to be safe we force the issue here. */

Function to Calculate Median in SQL Server

According to MSDN, Median is not available as an aggregate function in Transact-SQL. However, I would like to find out whether it is possible to create this functionality (using the Create Aggregate function, user defined function, or some other method).
What would be the best way (if possible) to do this - allow for the calculation of a median value (assuming a numeric data type) in an aggregate query?
If you're using SQL 2005 or better this is a nice, simple-ish median calculation for a single column in a table:
SELECT
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median
2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median
This article found the following pattern to be much, much faster than all other alternatives, at least on the simple schema they tested. This solution was 373x faster (!!!) than the slowest (PERCENTILE_CONT) solution tested. Note that this trick requires two separate queries which may not be practical in all cases. It also requires SQL 2012 or later.
DECLARE #c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);
SELECT AVG(1.0 * val)
FROM (
SELECT val FROM dbo.EvenRows
ORDER BY val
OFFSET (#c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - #c % 2) ROWS ONLY
) AS x;
Of course, just because one test on one schema in 2012 yielded great results, your mileage may vary, especially if you're on SQL Server 2014 or later. If perf is important for your median calculation, I'd strongly suggest trying and perf-testing several of the options recommended in that article to make sure that you've found the best one for your schema.
I'd also be especially careful using the (new in SQL Server 2012) function PERCENTILE_CONT that's recommended in one of the other answers to this question, because the article linked above found this built-in function to be 373x slower than the fastest solution. It's possible that this disparity has been improved in the 7 years since, but personally I wouldn't use this function on a large table until I verified its performance vs. other solutions.
ORIGINAL 2009 POST IS BELOW:
There are lots of ways to do this, with dramatically varying performance. Here's one particularly well-optimized solution, from Medians, ROW_NUMBERs, and performance. This is a particularly optimal solution when it comes to actual I/Os generated during execution – it looks more costly than other solutions, but it is actually much faster.
That page also contains a discussion of other solutions and performance testing details. Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column.
As with all database performance scenarios, always try to test a solution out with real data on real hardware – you never know when a change to SQL Server's optimizer or a peculiarity in your environment will make a normally-speedy solution slower.
SELECT
CustomerId,
AVG(TotalDue)
FROM
(
SELECT
CustomerId,
TotalDue,
-- SalesOrderId in the ORDER BY is a disambiguator to break ties
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
FROM Sales.SalesOrderHeader SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;
In SQL Server 2012 you should use PERCENTILE_CONT:
SELECT SalesOrderID, OrderQty,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
See also : http://blog.sqlauthority.com/2011/11/20/sql-server-introduction-to-percentile_cont-analytic-functions-introduced-in-sql-server-2012/
My original quick answer was:
select max(my_column) as [my_column], quartile
from (select my_column, ntile(4) over (order by my_column) as [quartile]
from my_table) i
--where quartile = 2
group by quartile
This will give you the median and interquartile range in one fell swoop. If you really only want one row that is the median then uncomment the where clause.
When you stick that into an explain plan, 60% of the work is sorting the data which is unavoidable when calculating position dependent statistics like this.
I've amended the answer to follow the excellent suggestion from Robert Ševčík-Robajz in the comments below:
;with PartitionedData as
(select my_column, ntile(10) over (order by my_column) as [percentile]
from my_table),
MinimaAndMaxima as
(select min(my_column) as [low], max(my_column) as [high], percentile
from PartitionedData
group by percentile)
select
case
when b.percentile = 10 then cast(b.high as decimal(18,2))
else cast((a.low + b.high) as decimal(18,2)) / 2
end as [value], --b.high, a.low,
b.percentile
from MinimaAndMaxima a
join MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5
This should calculate the correct median and percentile values when you have an even number of data items. Again, uncomment the final where clause if you only want the median and not the entire percentile distribution.
Even better:
SELECT #Median = AVG(1.0 * val)
FROM
(
SELECT o.val, rn = ROW_NUMBER() OVER (ORDER BY o.val), c.c
FROM dbo.EvenRows AS o
CROSS JOIN (SELECT c = COUNT(*) FROM dbo.EvenRows) AS c
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2);
From the master Himself, Itzik Ben-Gan!
MS SQL Server 2012 (and later) has the PERCENTILE_DISC function which computes a specific percentile for sorted values. PERCENTILE_DISC (0.5) will compute the median - https://msdn.microsoft.com/en-us/library/hh231327.aspx
Simple, fast, accurate
SELECT x.Amount
FROM (SELECT amount,
Count(1) OVER (partition BY 'A') AS TotalRows,
Row_number() OVER (ORDER BY Amount ASC) AS AmountOrder
FROM facttransaction ft) x
WHERE x.AmountOrder = Round(x.TotalRows / 2.0, 0)
If you want to use the Create Aggregate function in SQL Server, this is how to do it. Doing it this way has the benefit of being able to write clean queries. Note this this process could be adapted to calculate a Percentile value fairly easily.
Create a new Visual Studio project and set the target framework to .NET 3.5 (this is for SQL 2008, it may be different in SQL 2012). Then create a class file and put in the following code, or c# equivalent:
Imports Microsoft.SqlServer.Server
Imports System.Data.SqlTypes
Imports System.IO
<Serializable>
<SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToNulls:=True, IsInvariantToDuplicates:=False, _
IsInvariantToOrder:=True, MaxByteSize:=-1, IsNullIfEmpty:=True)>
Public Class Median
Implements IBinarySerialize
Private _items As List(Of Decimal)
Public Sub Init()
_items = New List(Of Decimal)()
End Sub
Public Sub Accumulate(value As SqlDecimal)
If Not value.IsNull Then
_items.Add(value.Value)
End If
End Sub
Public Sub Merge(other As Median)
If other._items IsNot Nothing Then
_items.AddRange(other._items)
End If
End Sub
Public Function Terminate() As SqlDecimal
If _items.Count <> 0 Then
Dim result As Decimal
_items = _items.OrderBy(Function(i) i).ToList()
If _items.Count Mod 2 = 0 Then
result = ((_items((_items.Count / 2) - 1)) + (_items(_items.Count / 2))) / 2#
Else
result = _items((_items.Count - 1) / 2)
End If
Return New SqlDecimal(result)
Else
Return New SqlDecimal()
End If
End Function
Public Sub Read(r As BinaryReader) Implements IBinarySerialize.Read
'deserialize it from a string
Dim list = r.ReadString()
_items = New List(Of Decimal)
For Each value In list.Split(","c)
Dim number As Decimal
If Decimal.TryParse(value, number) Then
_items.Add(number)
End If
Next
End Sub
Public Sub Write(w As BinaryWriter) Implements IBinarySerialize.Write
'serialize the list to a string
Dim list = ""
For Each item In _items
If list <> "" Then
list += ","
End If
list += item.ToString()
Next
w.Write(list)
End Sub
End Class
Then compile it and copy the DLL and PDB file to your SQL Server machine and run the following command in SQL Server:
CREATE ASSEMBLY CustomAggregate FROM '{path to your DLL}'
WITH PERMISSION_SET=SAFE;
GO
CREATE AGGREGATE Median(#value decimal(9, 3))
RETURNS decimal(9, 3)
EXTERNAL NAME [CustomAggregate].[{namespace of your DLL}.Median];
GO
You can then write a query to calculate the median like this:
SELECT dbo.Median(Field) FROM Table
I just came across this page while looking for a set based solution to median. After looking at some of the solutions here, I came up with the following. Hope is helps/works.
DECLARE #test TABLE(
i int identity(1,1),
id int,
score float
)
INSERT INTO #test (id,score) VALUES (1,10)
INSERT INTO #test (id,score) VALUES (1,11)
INSERT INTO #test (id,score) VALUES (1,15)
INSERT INTO #test (id,score) VALUES (1,19)
INSERT INTO #test (id,score) VALUES (1,20)
INSERT INTO #test (id,score) VALUES (2,20)
INSERT INTO #test (id,score) VALUES (2,21)
INSERT INTO #test (id,score) VALUES (2,25)
INSERT INTO #test (id,score) VALUES (2,29)
INSERT INTO #test (id,score) VALUES (2,30)
INSERT INTO #test (id,score) VALUES (3,20)
INSERT INTO #test (id,score) VALUES (3,21)
INSERT INTO #test (id,score) VALUES (3,25)
INSERT INTO #test (id,score) VALUES (3,29)
DECLARE #counts TABLE(
id int,
cnt int
)
INSERT INTO #counts (
id,
cnt
)
SELECT
id,
COUNT(*)
FROM
#test
GROUP BY
id
SELECT
drv.id,
drv.start,
AVG(t.score)
FROM
(
SELECT
MIN(t.i)-1 AS start,
t.id
FROM
#test t
GROUP BY
t.id
) drv
INNER JOIN #test t ON drv.id = t.id
INNER JOIN #counts c ON t.id = c.id
WHERE
t.i = ((c.cnt+1)/2)+drv.start
OR (
t.i = (((c.cnt+1)%2) * ((c.cnt+2)/2))+drv.start
AND ((c.cnt+1)%2) * ((c.cnt+2)/2) <> 0
)
GROUP BY
drv.id,
drv.start
The following query returns the median from a list of values in one column. It cannot be used as or along with an aggregate function, but you can still use it as a sub-query with a WHERE clause in the inner select.
SQL Server 2005+:
SELECT TOP 1 value from
(
SELECT TOP 50 PERCENT value
FROM table_name
ORDER BY value
)for_median
ORDER BY value DESC
Although Justin grant's solution appears solid I found that when you have a number of duplicate values within a given partition key the row numbers for the ASC duplicate values end up out of sequence so they do not properly align.
Here is a fragment from my result:
KEY VALUE ROWA ROWD
13 2 22 182
13 1 6 183
13 1 7 184
13 1 8 185
13 1 9 186
13 1 10 187
13 1 11 188
13 1 12 189
13 0 1 190
13 0 2 191
13 0 3 192
13 0 4 193
13 0 5 194
I used Justin's code as the basis for this solution. Although not as efficient given the use of multiple derived tables it does resolve the row ordering problem I encountered. Any improvements would be welcome as I am not that experienced in T-SQL.
SELECT PKEY, cast(AVG(VALUE)as decimal(5,2)) as MEDIANVALUE
FROM
(
SELECT PKEY,VALUE,ROWA,ROWD,
'FLAG' = (CASE WHEN ROWA IN (ROWD,ROWD-1,ROWD+1) THEN 1 ELSE 0 END)
FROM
(
SELECT
PKEY,
cast(VALUE as decimal(5,2)) as VALUE,
ROWA,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY ROWA DESC) as ROWD
FROM
(
SELECT
PKEY,
VALUE,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY VALUE ASC,PKEY ASC ) as ROWA
FROM [MTEST]
)T1
)T2
)T3
WHERE FLAG = '1'
GROUP BY PKEY
ORDER BY PKEY
In a UDF, write:
Select Top 1 medianSortColumn from Table T
Where (Select Count(*) from Table
Where MedianSortColumn <
(Select Count(*) From Table) / 2)
Order By medianSortColumn
Justin's example above is very good. But that Primary key need should be stated very clearly. I have seen that code in the wild without the key and the results are bad.
The complaint I get about the Percentile_Cont is that it wont give you an actual value from the dataset.
To get to a "median" that is an actual value from the dataset use Percentile_Disc.
SELECT SalesOrderID, OrderQty,
PERCENTILE_DISC(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
Using a single statement - One way is to use ROW_NUMBER(), COUNT() window function and filter the sub-query. Here is to find the median salary:
SELECT AVG(e_salary)
FROM
(SELECT
ROW_NUMBER() OVER(ORDER BY e_salary) as row_no,
e_salary,
(COUNT(*) OVER()+1)*0.5 AS row_half
FROM Employee) t
WHERE row_no IN (FLOOR(row_half),CEILING(row_half))
I have seen similar solutions over the net using FLOOR and CEILING but tried to use a single statement. (edited)
Median Finding
This is the simplest method to find the median of an attribute.
Select round(S.salary,4) median from employee S
where (select count(salary) from station
where salary < S.salary ) = (select count(salary) from station
where salary > S.salary)
See other solutions for median calculation in SQL here:
"Simple way to calculate median with MySQL" (the solutions are mostly vendor-independent).
Building on Jeff Atwood's answer above here it is with GROUP BY and a correlated subquery to get the median for each group.
SELECT TestID,
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID = Posts_parent.TestID ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID = Posts_parent.TestID ORDER BY Score DESC) AS TopHalf)
) / 2 AS MedianScore,
AVG(Score) AS AvgScore, MIN(Score) AS MinScore, MAX(Score) AS MaxScore
FROM Posts_parent
GROUP BY Posts_parent.TestID
For a continuous variable/measure 'col1' from 'table1'
select col1
from
(select top 50 percent col1,
ROW_NUMBER() OVER(ORDER BY col1 ASC) AS Rowa,
ROW_NUMBER() OVER(ORDER BY col1 DESC) AS Rowd
from table1 ) tmp
where tmp.Rowa = tmp.Rowd
Frequently, we may need to calculate Median not just for the whole table, but for aggregates with respect to some ID. In other words, calculate median for each ID in our table, where each ID has many records. (based on the solution edited by #gdoron: good performance and works in many SQL)
SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rnk
FROM our_table
) AS x
WHERE rnk IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;
Hope it helps.
For large scale datasets, you can try this GIST:
https://gist.github.com/chrisknoll/1b38761ce8c5016ec5b2
It works by aggregating the distinct values you would find in your set (such as ages, or year of birth, etc.), and uses SQL window functions to locate any percentile position you specify in the query.
To get median value of salary from employee table
with cte as (select salary, ROW_NUMBER() over (order by salary asc) as num from employees)
select avg(salary) from cte where num in ((select (count(*)+1)/2 from employees), (select (count(*)+2)/2 from employees));
I wanted to work out a solution by myself, but my brain tripped and fell on the way. I think it works, but don't ask me to explain it in the morning. :P
DECLARE #table AS TABLE
(
Number int not null
);
insert into #table select 2;
insert into #table select 4;
insert into #table select 9;
insert into #table select 15;
insert into #table select 22;
insert into #table select 26;
insert into #table select 37;
insert into #table select 49;
DECLARE #Count AS INT
SELECT #Count = COUNT(*) FROM #table;
WITH MyResults(RowNo, Number) AS
(
SELECT RowNo, Number FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Number) AS RowNo, Number FROM #table) AS Foo
)
SELECT AVG(Number) FROM MyResults WHERE RowNo = (#Count+1)/2 OR RowNo = ((#Count+1)%2) * ((#Count+2)/2)
--Create Temp Table to Store Results in
DECLARE #results AS TABLE
(
[Month] datetime not null
,[Median] int not null
);
--This variable will determine the date
DECLARE #IntDate as int
set #IntDate = -13
WHILE (#IntDate < 0)
BEGIN
--Create Temp Table
DECLARE #table AS TABLE
(
[Rank] int not null
,[Days Open] int not null
);
--Insert records into Temp Table
insert into #table
SELECT
rank() OVER (ORDER BY DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0), DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')),[SVR].[ref_num]) as [Rank]
,DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')) as [Days Open]
FROM
mdbrpt.dbo.View_Request SVR
LEFT OUTER JOIN dbo.dtv_apps_systems vapp
on SVR.category = vapp.persid
LEFT OUTER JOIN dbo.prob_ctg pctg
on SVR.category = pctg.persid
Left Outer Join [mdbrpt].[dbo].[rootcause] as [Root Cause]
on [SVR].[rootcause]=[Root Cause].[id]
Left Outer Join [mdbrpt].[dbo].[cr_stat] as [Status]
on [SVR].[status]=[Status].[code]
LEFT OUTER JOIN [mdbrpt].[dbo].[net_res] as [net]
on [net].[id]=SVR.[affected_rc]
WHERE
SVR.Type IN ('P')
AND
SVR.close_date IS NOT NULL
AND
[Status].[SYM] = 'Closed'
AND
SVR.parent is null
AND
[Root Cause].[sym] in ( 'RC - Application','RC - Hardware', 'RC - Operational', 'RC - Unknown')
AND
(
[vapp].[appl_name] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
OR
pctg.sym in ('Systems.Release Health Dashboard.Problem','DTV QA Test.Enterprise Release.Deferred Defect Log')
AND
[Net].[nr_desc] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
)
AND
DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0) = DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,#IntDate,getdate())), 0)
ORDER BY [Days Open]
DECLARE #Count AS INT
SELECT #Count = COUNT(*) FROM #table;
WITH MyResults(RowNo, [Days Open]) AS
(
SELECT RowNo, [Days Open] FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Days Open]) AS RowNo, [Days Open] FROM #table) AS Foo
)
insert into #results
SELECT
DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,#IntDate,getdate())), 0) as [Month]
,AVG([Days Open])as [Median] FROM MyResults WHERE RowNo = (#Count+1)/2 OR RowNo = ((#Count+1)%2) * ((#Count+2)/2)
set #IntDate = #IntDate+1
DELETE FROM #table
END
select *
from #results
order by [Month]
This works with SQL 2000:
DECLARE #testTable TABLE
(
VALUE INT
)
--INSERT INTO #testTable -- Even Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56
--
--INSERT INTO #testTable -- Odd Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 39 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56
DECLARE #RowAsc TABLE
(
ID INT IDENTITY,
Amount INT
)
INSERT INTO #RowAsc
SELECT VALUE
FROM #testTable
ORDER BY VALUE ASC
SELECT AVG(amount)
FROM #RowAsc ra
WHERE ra.id IN
(
SELECT ID
FROM #RowAsc
WHERE ra.id -
(
SELECT MAX(id) / 2.0
FROM #RowAsc
) BETWEEN 0 AND 1
)
For newbies like myself who are learning the very basics, I personally find this example easier to follow, as it is easier to understand exactly what's happening and where median values are coming from...
select
( max(a.[Value1]) + min(a.[Value1]) ) / 2 as [Median Value1]
,( max(a.[Value2]) + min(a.[Value2]) ) / 2 as [Median Value2]
from (select
datediff(dd,startdate,enddate) as [Value1]
,xxxxxxxxxxxxxx as [Value2]
from dbo.table1
)a
In absolute awe of some of the codes above though!!!
This is as simple an answer as I could come up with. Worked well with my data. If you want to exclude certain values just add a where clause to the inner select.
SELECT TOP 1
ValueField AS MedianValue
FROM
(SELECT TOP(SELECT COUNT(1)/2 FROM tTABLE)
ValueField
FROM
tTABLE
ORDER BY
ValueField) A
ORDER BY
ValueField DESC
The following solution works under these assumptions:
No duplicate values
No NULLs
Code:
IF OBJECT_ID('dbo.R', 'U') IS NOT NULL
DROP TABLE dbo.R
CREATE TABLE R (
A FLOAT NOT NULL);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (2);
INSERT INTO R VALUES (3);
INSERT INTO R VALUES (4);
INSERT INTO R VALUES (5);
INSERT INTO R VALUES (6);
-- Returns Median(R)
select SUM(A) / CAST(COUNT(A) AS FLOAT)
from R R1
where ((select count(A) from R R2 where R1.A > R2.A) =
(select count(A) from R R2 where R1.A < R2.A)) OR
((select count(A) from R R2 where R1.A > R2.A) + 1 =
(select count(A) from R R2 where R1.A < R2.A)) OR
((select count(A) from R R2 where R1.A > R2.A) =
(select count(A) from R R2 where R1.A < R2.A) + 1) ;
DECLARE #Obs int
DECLARE #RowAsc table
(
ID INT IDENTITY,
Observation FLOAT
)
INSERT INTO #RowAsc
SELECT Observations FROM MyTable
ORDER BY 1
SELECT #Obs=COUNT(*)/2 FROM #RowAsc
SELECT Observation AS Median FROM #RowAsc WHERE ID=#Obs
I try with several alternatives, but due my data records has repeated values, the ROW_NUMBER versions seems are not a choice for me. So here the query I used (a version with NTILE):
SELECT distinct
CustomerId,
(
MAX(CASE WHEN Percent50_Asc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId) +
MIN(CASE WHEN Percent50_desc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId)
)/2 MEDIAN
FROM
(
SELECT
CustomerId,
TotalDue,
NTILE(2) OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC) AS Percent50_Asc,
NTILE(2) OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC) AS Percent50_desc
FROM Sales.SalesOrderHeader SOH
) x
ORDER BY CustomerId;
For your question, Jeff Atwood had already given the simple and effective solution. But, if you are looking for some alternative approach to calculate the median, below SQL code will help you.
create table employees(salary int);
insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238);
select * from employees;
declare #odd_even int; declare #cnt int; declare #middle_no int;
set #cnt=(select count(*) from employees); set #middle_no=(#cnt/2)+1; select #odd_even=case when (#cnt%2=0) THEN -1 ELse 0 END ;
select AVG(tbl.salary) from (select salary,ROW_NUMBER() over (order by salary) as rno from employees group by salary) tbl where tbl.rno=#middle_no or tbl.rno=#middle_no+#odd_even;
If you are looking to calculate median in MySQL, this github link will be useful.

Resources