How to sum if within percentile in SQL Server? - sql-server

I have a table that looks something like this:
It contains more than 100k rows.
I know how to get the median (or other percentile) values per week:
SELECT DISTINCT week,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY visits) OVER (PARTITION BY week) AS visit_median
FROM table
ORDER BY week
But how do I return a column with the total visits within the top N percentile of the group per week?

I don't think you want percentile_cont(). You can try using ntile(). For instance, the top decile:
SELECT week, SUM(visits)
FROM (SELECT t.*,
NTILE(100) OVER (PARTITION BY week ORDER BY visits DESC) as tile
FROM table
) t
WHERE tile <= 10
GROUP BY week
ORDER BY week;
You need to understand how NTILE() handles ties. Rows with the same number of visits can go into different tiles. That is, the sizes of the tiles differ by at most 1. This may or may not be what you really want.

Related

How can I retrieve "exception" data from a table without knowing the data in advance?

I have a table that updates all the time.
The table maintains a list that links stores to clubs, and manages, among other things, "discount percentages" per store + club.
Table name: Policy_supplier
Column: POLXSUP_DISCOUNT
Suppose all the "vendors" in the table are marked with a 10% discount.
And someone accidentally signs one vendor with 8% or 15% (or even NULL)
How do I generate a query to retrieve the "abnormal" vendor?
You can find the mode of your discounts and then just pick out the records that aren't equal to that mode:
WITH mode_discount AS (SELECT TOP 1 POLXSUP_DISCOUNT FROM table GROUP BY POLXSUP_DISCOUNT ORDER BY count(*) DESC)
SELECT * FROM table WHERE POLXSUP_DISCOUNT <> (SELECT POLSXUP_DISCOUNT FROM mode_discount);
You can use the OVER clause with aggregates to calculate an aggregate over a data range and include it in the results. For example,
SELECT avg(POLXSUP_DISCOUNT)
from Policy_supplier
Would return a single average value while
SELECT POLXSUP_DISCOUNT, avg(POLXSUP_DISCOUNT) OVER()
from Policy_supplier
Would return the overall average in each row. Typically OVER is used with a PARTITION BY clause. If you wanted the average per supplier you could have written AVG() OVER(PARTITION BY supplierID).
To find anomalies, you should use one of the PERCENTILE functions, eg PERCENTILE_CONT. For example
select PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over()
from Policy_Supplier
Will return a discount value below which you'll find 95% of the records. The other 5% of discounts that are above this are probably anomalies.
Similarly, PERCENTILE_CONT(0.05) will return a discount below which you'll find 5% of the records
You can combine both to find potentially exceptional records, eg:
with percentiles as (
select ID,
POLXSUP_DISCOUNT,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct95,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct05,
from Policy_Supplier)
select ID,POLXSUP_DISCOUNT
from percentiles
where POLXSUP_DISCOUNT>pct95 or POLXSUP_DISCOUNT<pct05

SQL Server - Select top 10 providers based on charge amounts from 3rd quarter

I'm having some trouble figuring out what I need to do to make this work and meet these requirements:
Generate a SELECT statement that figures out the top 10 providers from
the credit db based on total charge amounts from the 3rd quarter.
Here's what I have so far:
select top 10 provider_no, charge_amt, charge_dt
from charge
group by provider_no
order by charge_amt desc;
Could I get some help finishing this query so it shows only 10 lines and doesn't have repeating provider_no's sorted by the charge_amt with the charge_dt in the 3rd quarter of the year?
I assume that charge_dt is a date time of the charge. So the statement should look like:
select top 10 provider_no, datepart(qq, charge_dt), sum(charge_amt)
from charge
where datepart(qq, charge_dt) = 3
group by provider_no, datepart(qq, charge_dt)
order by sum(charge_amt) desc

How can I sum certain columns based on individual dates and unique IDs

The attached image represents my SQL table of data. I am trying to create two new columns that have the sum of the total minutes and the count of services per distinct case number for each day. For example, if a case number appears three times that mean there were three services provided but I want the total minutes and services totaled separately for each day and for each case number. I've tried the query below and it gives me the count of services but not how many per day nor the total minutes.
SELECT *,
COUNT(CASE_NUM) OVER (PARTITION BY CASE_NUM) as AGGRE_SERVICES,
SUM(TOTAL_MIN) OVER (PARTITION BY DATE) as AGGRE_MINS
FROM #SQLTABLE
Try this code below.
SELECT *,
COUNT(CASE_NUM) OVER (PARTITION BY CASE_NUM, DATE) as AGGRE_SERVICES,
SUM(TOTAL_MIN) OVER (PARTITION BY CASE_NUM, DATE) as AGGRE_MINS
FROM #SQLTABLE

Compute sum for distinct order numbers in ssrs report

I'm using a SQL Server 2008R2 Database and SSRS Report Builder 3.0
Trying to compute the sum of the amount owed for each order id (need to show the itemids)...but when I do, the amount owed is showing 400 (instead of 200 - line 4, 100 instead of 50 in line 7, line 9 is correct. As a result the Total line is way off)
=Sum(Fields!owe.Value)
The report is grouped by the campus.
I understand that ssrs is probably not the best place to do this computation but I don't know how to do outside of ssrs....I tried distinct and group by so far with no results.
Below is how I need the report to show like....
Thanks in advance.
Incorrect amounts are
Another example as it should display the subtotals
I would modify the SQL to produce an extra column just for purposes of summing the Owe on an OrderId. Use the Row Number to get the first item in each order, and only supply the Owe value for that item for each order:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY ItemId) AS rn
FROM MyTable
WHERE (whatever filters you use)
)
SELECT *,
CASE WHEN rn=1 THEN Owe ELSE 0 END AS OrderOwe
FROM cte
ORDER BY Campus, CustomerId, OrderId, ItemId
Then simply change the expression for the "Owe" textbox in your SubTotal row to this:
=Sum(Fields!OrderOwe.Value)
And you will get the sum of the Owe per order instead of per item.
Well if your owe is always the same for each item in the group you could add a Sum/Count of the item in the group which would give you the correct results in all the cases above.

How to get analysis result faster using Partition?

I have a table in SQL Server 2012, which has these 2 columns:
Date, Amount
I want to get the summary of a month, i.e. for May 2013. And I also want to get the summary of last month, the same month last year, and average of past 12 month. I know I can use GROUP BY to get the data for each month, then get all the data I need. However, the table has so many rows, I want to make it faster.
One possibility is to use Partition By
SELECT DISTINCT YEAR(Date), MONTH(Date), SUM(Amount) OVER (Partiotion By YEAR(Date), MONTH(Date))
FROM myTable
However, how can I use this to get data like: last month, same month last year, and average of past 12 month?
Or, I need to use partition by to get monthly data, and then use ROWS to get them?
Any ideas?
Thanks
The key idea is to first aggregate the data in a subquery or CTE. Then you can express the conditions you want using window functions:
SELECT yr, mon, amount,
LAG(Amount) OVER (ORDER BY yr*100+mon) as LastMonth,
LAG(Amount, 12) OVER (ORDER BY yr*100+mon) as LastYearMonth,
AVG(Amount) OVER (ORDER BY yr*100 + mon RANGE BETWEEN 11 PRECEDING AND CURRENT ROW)
FROM (SELECT YEAR(Date) as yr, MONTH(Date) as mon, SUM(Amount) as Amount
FROM myTable
GROUP BY YEAR(Date), MONTH(Date)
) ym;

Resources