Distinct top 25 SQL server query - sql-server

I have a problem with creating a query with distinct and top. What i want is a top 25 with maximum values and a distinct by the parameter column.
My query now is:
select distinct top 25
startDate, parameter, min, max, avg, amount_called
from
VisualisatieData.dbo.metric_data_by_day_parameter
where
startDate between '2013-05-30 08:46' and '2013-05-31 16:00'
and endDate between '2013-05-30 08:46' and '2013-05-31 16:00'
order by
max desc
This returns the following:
2013-05-31 01:08:26.000 P1 0 318386 1662 795
2013-05-31 00:01:36.000 P2 0 312325 1554 806
2013-05-31 00:01:36.000 P3 0 124827 25877 14
2013-05-30 08:49:19.000 P4 0 91992 11381 54
2013-05-31 01:05:54.000 P5 47 42410 497 499
2013-05-31 01:05:54.000 P6 16 42395 469 499
2013-05-31 01:05:55.000 P7 0 41380 244 498
2013-05-31 00:01:36.000 P8 328 35225 5305 8
2013-05-31 05:34:10.000 P4 16 12137 1208 17
2013-05-31 03:50:18.000 P9 0 11137 4687 23
2013-05-31 01:23:41.000 P10 391 8013 3237 95
2013-05-31 01:23:41.000 P11 375 7998 3174 98
2013-05-31 01:19:55.000 P12 453 7263 2437 58
2013-05-31 07:57:05.000 P13 2343 5639 3991 2
2013-05-31 03:32:21.000 P14 1687 5077 2993 9
2013-05-30 08:48:57.000 P15 984 5061 2419 12
2013-05-30 08:48:57.000 P16 984 5061 2419 12
2013-05-31 01:40:37.000 P15 1281 5045 2619 10
2013-05-31 01:40:37.000 P16 1281 5045 2619 10
2013-05-31 03:08:51.000 P17 562 4608 1302 18
2013-05-30 16:59:05.000 P18 4202 4202 4202 1
2013-05-30 16:59:05.000 P19 4202 4202 4202 1
2013-05-31 03:37:30.000 P20 875 4139 2681 18
2013-05-31 03:08:51.000 P21 547 3999 1203 18
2013-05-31 03:19:17.000 P22 31 3702 1399 5
This time there are 3 duplicate parameters what i dont want. Depending on the time selection there are more duplicates. I think this doesnt work because the distinct must be applied on only the parameter column.
I tried the following:
SELECT DISTINCT TOP 25 startDate, parameter, min, max, avg, amount_called
FROM
( SELECT startDate, endDate, parameter, min, max, avg, amount_called, ROW_NUMBER() over(partition by parameter order by max desc) subselect
FROM VisualisatieData.dbo.metric_data_by_day_parameter
) A
where startDate between '2013-05-30 08:46' and '2013-05-31 16:00' and endDate between '2013-05-30 08:46' and '2013-05-31 16:00'
ORDER BY max desc
But this doesnt work either, it returns the same as the first query.
I hope i described my problem clearly, if you want more information ask me.
How can i change my query so i get a top 25 with maximum values and no duplicate parameters? Suggestions are appreciated!
Thanks in advance!

Try to
select distinct top 25
startDate, parameter, min, max, avg, amount_called
from VisualisatieData.dbo.metric_data_by_day_parameter as tb
where
startDate between '2013-05-30 08:46' and '2013-05-31 16:00'
and endDate between '2013-05-30 08:46' and '2013-05-31 16:00'
and max = (select max(max)
from VisualisatieData.dbo.metric_data_by_day_parameter
where startDate between '2013-05-30 08:46' and '2013-05-31 16:00'
and endDate between '2013-05-30 08:46' and '2013-05-31 16:00'
and parameter = tb.parameter )
order by
max desc
then the same parameter will be removed

try this
SELECT * FROM (
SELECT DISTINCT startDate, parameter, min, max, avg, amount_called
FROM
( SELECT startDate, endDate, parameter, min, max, avg, amount_called, ROW_NUMBER() over(partition by parameter order by max desc) subselect
FROM VisualisatieData.dbo.metric_data_by_day_parameter
) A
where startDate between '2013-05-30 08:46' and '2013-05-31 16:00' and endDate between '2013-05-30 08:46' and '2013-05-31 16:00'
ORDER BY max desc
) as mytop LIMIT 25;

Related

Snowflake: Aggregating by a sliding window (the past 60 mins) for a dataset where the sampling frequency is non-uniform

I have data with non-uniform sampling dist. I want to the aggregate data on a rolling/ sliding basis (the past 60 mins).
In order to achieve an hourly average (partitioned by city), I used to following code which worked.
SELECT *,
AVG(VALUE) OVER (PARTITION BY CITY, DATE_AND_HOUR ORDER BY TIMESTAMP
FROM
(
SELECT *,
date_trunc('HOUR', TIMESTAMP) as DATE_AND_Hour
FROM SAMPLE_DATA
)
However, my desired output is as follows:
I know Snowflake doesn't support RANGE and I can't use specify which rows BETWEEN in a windows function as my sampling dist is non-uniform.
I read some potential solutions on this page but they don't work in snowflake: sum last n days quantity using sql window function
Essentially, it's an analogous problem.
You can solve this with a self-join:
with data as (
select *
from temp_fh_wikipedia.public.wikipedia_2020
where title in ('San_Francisco', 'Los_Angeles')
and wiki='en'
and datehour > '2020-10-13'
)
select a.title, a.datehour, a.views, avg(b.views) avg_previous_5h
from data a
join (
select *
from data
) b
on a.title=b.title
and b.datehour between timestampadd(hour, -5, a.datehour) and a.datehour
group by 1, 2, 3
order by 1, 2
limit 100
Just change 'hour' for 'minutes', if you want the last x minutes.
Firstly what you show as "average" in your example is the "sum", and you first "Shanghia" result is including a "Beijing" result.
You have two options, build a fixed sized window dataset (build partials for each minute) and then use window frame of fixed size over that, OR self-join and just aggregate those (as Felipe has shown).
If you have very dense data, you might find the former more performant, and if you have sparse data, the later approach should be faster, and is definitely faster to code.
So the simple first:
with data(city, timestamp, value) as (
select column1, try_to_timestamp(column2, 'yyyy/mm/dd hh:mi'), column3 from values
('beijing', '2022/05/25 10:33', 22),
('beijing', '2022/05/25 10:37', 20),
('beijing', '2022/05/25 11:36', 29),
('beijing', '2022/05/26 11:36', 28),
('beijing', '2022/05/26 10:00', 21),
('shanghai', '2022/05/26 11:00', 33),
('shanghai', '2022/05/26 11:46', 35),
('shanghai', '2022/05/26 12:40', 37)
)
select a.*
,avg(b.value) as p60_avg
,count(b.value)-1 as p60_count
,sum(b.value) as p60_sum
from data as a
left join data as b
on a.city = b.city and b.timestamp between dateadd(hour, -1, a.timestamp) and a.timestamp
group by 1,2,3
order by 1,2
gives:
CITY
TIMESTAMP
VALUE
P60_AVG
P60_COUNT
P60_SUM
beijing
2022-05-25 10:33:00.000
22
22
0
22
beijing
2022-05-25 10:37:00.000
20
21
1
42
beijing
2022-05-25 11:36:00.000
29
24.5
1
49
beijing
2022-05-26 10:00:00.000
21
21
0
21
beijing
2022-05-26 11:36:00.000
28
28
0
28
shanghai
2022-05-26 11:00:00.000
33
33
0
33
shanghai
2022-05-26 11:46:00.000
35
34
1
68
shanghai
2022-05-26 12:40:00.000
37
36
1
72
The dense version:
with data(city, timestamp, value) as (
select column1, try_to_timestamp(column2, 'yyyy/mm/dd hh:mi'), column3 from values
('beijing', '2022/05/25 10:33', 22),
('beijing', '2022/05/25 10:37', 20),
('beijing', '2022/05/25 11:36', 29),
('beijing', '2022/05/26 11:36', 28),
('beijing', '2022/05/26 10:00', 21),
('shanghai', '2022/05/26 11:00', 33),
('shanghai', '2022/05/26 11:46', 35),
('shanghai', '2022/05/26 12:40', 37)
), filled_time as (
select city,
dateadd(minute, row_number() over(partition by city order by null)-1, min_t) as timestamp
from (
select
city, min(timestamp) as min_t, max(timestamp) as max_t
from data
group by 1
), table(generator(ROWCOUNT => 10000))
qualify timestamp <= max_t
)
select
ft.city
,ft.timestamp
,avg(d.value) over (order by ft.timestamp ROWS BETWEEN 60 PRECEDING AND current row ) as p60_avg
from filled_time as ft
left join data as d
on ft.city = d.city and ft.timestamp = d.timestamp
order by 1,2;
gives:
CITY
TIMESTAMP
P60_AVG
beijing
2022-05-25 10:33:00.000
22
beijing
2022-05-25 10:34:00.000
22
beijing
2022-05-25 10:35:00.000
22
beijing
2022-05-25 10:36:00.000
22
beijing
2022-05-25 10:37:00.000
21
beijing
2022-05-25 10:38:00.000
21
beijing
2022-05-25 10:39:00.000
21
beijing
2022-05-25 10:40:00.000
21
beijing
2022-05-25 10:41:00.000
21
beijing
2022-05-25 10:42:00.000
21
beijing
2022-05-25 10:43:00.000
21
beijing
2022-05-25 10:44:00.000
21
beijing
2022-05-25 10:45:00.000
21
beijing
2022-05-25 10:46:00.000
21
snip...
And those "extra" rows could be dumped with a qualify
select
ft.city
,ft.timestamp
,avg(d.value) over (order by ft.timestamp ROWS BETWEEN 60 PRECEDING AND current row ) as p60_avg
--,count(b.value)-1 as p60_count
--,sum(b.value) as p60_sum
from filled_time as ft
left join data as d
on ft.city = d.city and ft.timestamp = d.timestamp
qualify d.value is not null
order by 1,2;

Recursive query to use a date returned in initial query as limit in subsequent query

I have a business need to project when a specific task needs to be done based on the usage of a task.
For example, you need to change the oil in your car every 3000 miles. Some days you drive 300 miles, and other days you drive 500 miles. When you hit 3000, you change the oil, and restart the counter. Based on a projected usage table, return a set of all the oil change dates.
I could do this in a table-valued function or some other 'coded' solution.
But I thought I could do it in one statement, a recursive cte perhaps.
I'm having difficulties 'joining' the next date into the WHERE of the recursive part.
And SQL doesn't like 'TOP 1' in a recursive CTE at all. :)
I would like a set like this:
This is what I've got:
WITH cte_MilesMX (RateDate,RunningRateMiles)
AS
(
-- Initial query
SELECT TOP 1 *
FROM (
SELECT
RateDate,
SUM(RateMiles) OVER (ORDER BY RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay
WHERE RateDate > '2020-01-01') q1
WHERE q1.RunningRateMiles >= 3000
UNION ALL
-- Recursive part
SELECT TOP 1 *
FROM (
SELECT
rbd.RateDate,
SUM(RateMiles) OVER (ORDER BY rbd.RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay rbd
JOIN cte_MilesMX cte
ON 1 = 1
WHERE rbd.RateDate > cte.RateDate) q1
WHERE q1.RunningRateMiles >= 3000
)
SELECT *
FROM cte_MilesMX
If you want to fool with this, here is the example:
Any help would be greatly appreciated.
Thanks.
CREATE TABLE RatesbyDay(
RateDate DATE,
RateMiles INT);
INSERT INTO RatesbyDay VALUES ('2020-01-01',600)
INSERT INTO RatesbyDay VALUES ('2020-01-02',450)
INSERT INTO RatesbyDay VALUES ('2020-01-03',370)
INSERT INTO RatesbyDay VALUES ('2020-01-04',700)
INSERT INTO RatesbyDay VALUES ('2020-01-05',100)
INSERT INTO RatesbyDay VALUES ('2020-01-06',480)
INSERT INTO RatesbyDay VALUES ('2020-01-07',430)
INSERT INTO RatesbyDay VALUES ('2020-01-08',200)
INSERT INTO RatesbyDay VALUES ('2020-01-09',590)
INSERT INTO RatesbyDay VALUES ('2020-01-10',380)
INSERT INTO RatesbyDay VALUES ('2020-01-11',220)
INSERT INTO RatesbyDay VALUES ('2020-01-12',320)
INSERT INTO RatesbyDay VALUES ('2020-01-13',360)
INSERT INTO RatesbyDay VALUES ('2020-01-14',600)
INSERT INTO RatesbyDay VALUES ('2020-01-15',450)
INSERT INTO RatesbyDay VALUES ('2020-01-16',475)
INSERT INTO RatesbyDay VALUES ('2020-01-17',300)
INSERT INTO RatesbyDay VALUES ('2020-01-18',190)
INSERT INTO RatesbyDay VALUES ('2020-01-19',435)
INSERT INTO RatesbyDay VALUES ('2020-01-20',285)
INSERT INTO RatesbyDay VALUES ('2020-01-21',350)
INSERT INTO RatesbyDay VALUES ('2020-01-22',410)
INSERT INTO RatesbyDay VALUES ('2020-01-23',250)
INSERT INTO RatesbyDay VALUES ('2020-01-24',300)
INSERT INTO RatesbyDay VALUES ('2020-01-25',250)
INSERT INTO RatesbyDay VALUES ('2020-01-26',650)
INSERT INTO RatesbyDay VALUES ('2020-01-27',180)
INSERT INTO RatesbyDay VALUES ('2020-01-28',280)
INSERT INTO RatesbyDay VALUES ('2020-01-29',200)
INSERT INTO RatesbyDay VALUES ('2020-01-30',100)
INSERT INTO RatesbyDay VALUES ('2020-01-31',100)
-- this returns the 1st oil change assuming we just changed it on 1-1-2020
SELECT TOP 1 *
FROM (
SELECT
RateDate,
SUM(RateMiles) OVER (ORDER BY RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay
WHERE RateDate > '2020-01-01') q1
WHERE q1.RunningRateMiles >= 3000
-- the above query returned 1-9-2020 as the oil change, so when is the next one.
SELECT TOP 1 *
FROM (
SELECT
RateDate,
SUM(RateMiles) OVER (ORDER BY RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay
WHERE RateDate > '2020-01-09') q1
WHERE q1.RunningRateMiles >= 3000
-- etc. etc.
SELECT TOP 1 *
FROM (
SELECT
RateDate,
SUM(RateMiles) OVER (ORDER BY RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay
WHERE RateDate > '2020-01-17') q1
WHERE q1.RunningRateMiles >= 3000
SELECT TOP 1 *
FROM (
SELECT
RateDate,
SUM(RateMiles) OVER (ORDER BY RateDate) AS RunningRateMiles
FROM dbo.RatesbyDay
WHERE RateDate > '2020-01-26') q1
WHERE q1.RunningRateMiles >= 3000
This isn't a recursive CTE but it does do what you're what you're trying to do. The technique goes by a couple different names... Usually either "Quirky Update" or "Ordered Update".
First thing, notice that I added two new columns to your table and a clustered index. They are in fact necessary but if are unwilling or unable to modify the existing table, this works just as well with a #TempTable.
For more detailed information, see Solving the Running Total and Ordinal Rank Problems (Rewritten)
Also... fair warning, this technique isn't without it's detractors due to the fact that Microsoft doesn't guarantee that it will work as expected.
USE tempdb;
GO
IF OBJECT_ID('tempdb.dbo.RatesByDay', 'U') IS NOT NULL
BEGIN DROP TABLE tempdb.dbo.RatesByDay; END;
GO
CREATE TABLE tempdb.dbo.RatesByDay (
RateDate date NOT NULL
CONSTRAINT pk_RatesByDay PRIMARY KEY CLUSTERED (RateDate), -- clustered index is needed to control the direction of the update.
RateMiles int NOT NULL,
IsChangeDay bit NULL,
MilesSinceLastChange int NULL
);
GO
INSERT tempdb.dbo.RatesByDay (RateDate, RateMiles) VALUES
('2020-01-01',600),('2020-01-02',450),('2020-01-03',370),('2020-01-04',700),('2020-01-05',100),('2020-01-06',480),
('2020-01-07',430),('2020-01-08',200),('2020-01-09',590),('2020-01-10',380),('2020-01-11',220),('2020-01-12',320),
('2020-01-13',360),('2020-01-14',600),('2020-01-15',450),('2020-01-16',475),('2020-01-17',300),('2020-01-18',190),
('2020-01-19',435),('2020-01-20',285),('2020-01-21',350),('2020-01-22',410),('2020-01-23',250),('2020-01-24',300),
('2020-01-25',250),('2020-01-26',650),('2020-01-27',180),('2020-01-28',280),('2020-01-29',200),('2020-01-30',100),
('2020-01-31',100);
--=====================================================================================================================
DECLARE
#RunningMiles int = 0,
#Anchor date;
UPDATE rbd SET
#RunningMiles = rbd.MilesSinceLastChange = CASE WHEN #RunningMiles < 3000 THEN #RunningMiles ELSE 0 END + rbd.RateMiles,
rbd.IsChangeDay = CASE WHEN #RunningMiles < 3000 THEN 0 ELSE 1 END,
#Anchor = rbd.RateDate
FROM
dbo.RatesByDay rbd WITH (TABLOCKX, INDEX (1))
WHERE 1 = 1
AND rbd.RateDate > '2020-01-01'
OPTION (MAXDOP 1);
-------------------------------------
SELECT * FROM dbo.RatesByDay rbd;
And the results...
RateDate RateMiles IsChangeDay MilesSinceLastChange
---------- ----------- ----------- --------------------
2020-01-01 600 NULL NULL
2020-01-02 450 0 450
2020-01-03 370 0 820
2020-01-04 700 0 1520
2020-01-05 100 0 1620
2020-01-06 480 0 2100
2020-01-07 430 0 2530
2020-01-08 200 0 2730
2020-01-09 590 1 3320
2020-01-10 380 0 380
2020-01-11 220 0 600
2020-01-12 320 0 920
2020-01-13 360 0 1280
2020-01-14 600 0 1880
2020-01-15 450 0 2330
2020-01-16 475 0 2805
2020-01-17 300 1 3105
2020-01-18 190 0 190
2020-01-19 435 0 625
2020-01-20 285 0 910
2020-01-21 350 0 1260
2020-01-22 410 0 1670
2020-01-23 250 0 1920
2020-01-24 300 0 2220
2020-01-25 250 0 2470
2020-01-26 650 1 3120
2020-01-27 180 0 180
2020-01-28 280 0 460
2020-01-29 200 0 660
2020-01-30 100 0 760
2020-01-31 100 0 860
You can do this with a recursive query:
with
data as (select r.*, row_number() over(order by ratedate) rn from ratesbyday r),
cte as (
select d.*, ratemiles total, ratemiles newtotal from data d where rn = 1
union all
select d.*,
c.newtotal + d.ratemiles,
case when c.newtotal < 3000 and c.newtotal + d.ratemiles >= 3000 then 0 else c.newtotal + d.ratemiles end
from cte c
inner join data d on d.rn = c.rn + 1
)
select ratedate, ratemiles, total
from cte
where newtotal = 0
order by ratedate
The query starts by enumerating the rows. Then, it iteratively walks them, starting from the "first" one; everytime we exceed the 3000 miles threshold, we reset the running miles count. We can then filter on "reset" rows.
Demo on DB Fiddle:
ratedate | ratemiles | total
:--------- | --------: | ----:
2020-01-07 | 430 | 3130
2020-01-15 | 450 | 3120
2020-01-25 | 250 | 3245
If there may be more than 100 rows in your dataset, you need to add option (maxrecursion 0) at the very end of the query.
In this instance I would use a rolling agg and then use the mod operator to find the points where it hits the 3000 interval.
Using the table desc and inserts above here is an example:
-- When the mod value "resets" then the oil change is due, check this using LAG
SELECT
agg.RateDate
,agg.RateMiles
,agg.MilesAgg
,agg.MilesAgg%3000 AS ModValue
,CASE WHEN agg.MilesAgg%3000 < LAG(agg.MilesAgg) OVER(ORDER BY agg.RateDate)%3000
THEN 'Due'
ELSE 'NotDue'
END
FROM
(
--Get the rolling total of miles
SELECT
rbd.RateDate
,rbd.RateMiles
,SUM(rbd.RateMiles) OVER(ORDER BY rbd.RateDate ROWS UNBOUNDED PRECEDING) AS MilesAgg
FROM #RatesByDay rbd
) agg
Results, first day is counting the 600 miles as being AFTER the oil change
RateDate Mi MiAgg Mod IsDue?
--------------------------------------
2020-01-01 600 600 600 NotDue
2020-01-02 450 1050 1050 NotDue
2020-01-03 370 1420 1420 NotDue
2020-01-04 700 2120 2120 NotDue
2020-01-05 100 2220 2220 NotDue
2020-01-06 480 2700 2700 NotDue
2020-01-07 430 3130 130 Due
2020-01-08 200 3330 330 NotDue
2020-01-09 590 3920 920 NotDue
2020-01-10 380 4300 1300 NotDue
2020-01-11 220 4520 1520 NotDue
2020-01-12 320 4840 1840 NotDue
2020-01-13 360 5200 2200 NotDue
2020-01-14 600 5800 2800 NotDue
2020-01-15 450 6250 250 Due
2020-01-16 475 6725 725 NotDue
2020-01-17 300 7025 1025 NotDue
2020-01-18 190 7215 1215 NotDue
2020-01-19 435 7650 1650 NotDue
2020-01-20 285 7935 1935 NotDue
2020-01-21 350 8285 2285 NotDue
2020-01-22 410 8695 2695 NotDue
2020-01-23 250 8945 2945 NotDue
2020-01-24 300 9245 245 Due
2020-01-25 250 9495 495 NotDue
2020-01-26 650 10145 1145 NotDue
2020-01-27 180 10325 1325 NotDue
2020-01-28 280 10605 1605 NotDue
2020-01-29 200 10805 1805 NotDue
2020-01-30 100 10905 1905 NotDue
2020-01-31 100 11005 2005 NotDue

T-SQL Count of Records in Status for Previous Months

I have a T-SQL Quotes table and need to be able to count how many quotes were in an open status during past months.
The dates I have to work with are an 'Add_Date' timestamp and an 'Update_Date' timestamp. Once a quote is put into a 'Won' or 'Loss' columns with a value of '1' in that column it can no longer be updated. Therefore, the 'Update_Date' effectively becomes the Closed_Status timestamp.
Here's a few example records:
Quote_No Add_Date Update_Date Open_Quote Win Loss
001 01-01-2016 NULL 1 0 0
002 01-01-2016 3-1-2016 0 1 0
003 01-01-2016 4-1-2016 0 0 1
Here's a link to all the data here:
https://drive.google.com/open?id=0B4xdnV0LFZI1T3IxQ2ZKRDhNd1k
I asked this question previously this year and have been using the following code:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
)
select format(dateadd(month, n.n, q.add_date), 'yyyy-MM') as yyyymm,
count(*) as Open_Quote_Count
from quotes q join
n
on (closed_status = 1 and dateadd(month, n.n, q.add_date) <= q.update_date) or
(closed_status = 0 and dateadd(month, n.n, q.add_date) <= getdate())
group by format(dateadd(month, n.n, q.add_date), 'yyyy-MM')
order by yyyymm;
The problem is this code is returning a cumulative value. So January was fine, but then Feb is really Jan + Feb, and March is Jan+Feb+March, etc. etc. It took me a while to discover this and the numbers returned now way, way off and I'm trying to correct them.
From the full data set the results of this code are:
Year-Month Open_Quote_Count
2017-01 153
2017-02 265
2017-03 375
2017-04 446
2017-05 496
2017-06 560
2017-07 609
The desired result would be how many quotes were in an open status during that particular month, not the cumulative :
Year-Month Open_Quote_Count
2017-01 153
2017-02 112
2017-03 110
2017-04 71
Thank you in advance for your help!
Unless I am missing something, LAG() would be a good fit here
Example
Declare #YourTable Table ([Year-Month] varchar(50),[Open_Quote_Count] int)
Insert Into #YourTable Values
('2017-01',153)
,('2017-02',265)
,('2017-03',375)
,('2017-04',446)
,('2017-05',496)
,('2017-06',560)
,('2017-07',609)
Select *
,NewValue = [Open_Quote_Count] - lag([Open_Quote_Count],1,0) over (Order by [Year-Month])
From #YourTable --<< Replace with your initial query
Returns
Year-Month Open_Quote_Count NewValue
2017-01 153 153
2017-02 265 112
2017-03 375 110
2017-04 446 71
2017-05 496 50
2017-06 560 64
2017-07 609 49

Group by and sum based on column values without sum() over()?

We have a table [Kpis] that looks like the following:
RawId EmpId Date Hour Min KpiValue KpiName
106 ABC123 20160310 8 0 3 Kpi1
124 ABC123 20160310 8 0 65 Kpi1
121 ABC123 20160310 8 15 12 Kpi2
109 ABC109 20160310 8 0 34 Kpi2
112 ABC908 20160310 9 5 3 Kpi1
118 ABC907 20160310 8 30 24 Kpi1
115 ABC123 20160310 8 15 54 Kpi1
I would like to group by EmpId, KpiName, Date, Hour. So, for example, with this data, Kpi1 for EmpId ABC123 at Hour 8 would be 122.
So I tried using the CASE statement, but the result is incorrect. I haven't checked the actual totals in the result, but the sums should be correct. It's the format of the result that's incorrect; every empid has two rows: one for Kpi1 and one for Kpi2.
select empid,
case kpiname when 'Kpi1' then sum(kpivalue) end as 'Kpi1',
case kpiname when 'Kpi2' then sum(kpivalue) end as 'Kpi2'
from
[Kpis]
where kpiname in ('Kpi1', 'Kpi2')
and date = 20160310 and hour = 8
group by empid, kpiname, hour
How can I use the Case statement to fix the results?
Thanks.
Put the case inside your sum, such that you for each KpiName only sums the relevant values.
SELECT
EmpId,
[Hour],
SUM(
CASE
WHEN KpiName = 'Kpi1' THEN KpiValue
ELSE 0
END
) Kpi1,
SUM(
CASE
WHEN KpiName = 'Kpi2' THEN KpiValue
ELSE 0
END
) Kpi2
FROM
Kpis
GROUP BY
EmpId,
[Hour]
This produces this output
EmpId Hour Kpi1 Kpi2
ABC109 8 0 34
ABC123 8 122 12
ABC907 8 24 0
ABC908 9 3 0
SUM fucntion have to be outside of CASE:
select empid,
sum(case kpiname when 'Kpi1' then kpivalue end) as 'Kpi1',
sum(case kpiname when 'Kpi2' then kpivalue end) as 'Kpi2'
from
[Kpis]
where kpiname in ('Kpi1', 'Kpi2')
and date = 20160310 and hour = 8
group by empid, kpiname, hour
You can also do this with the PIVOT functionality, which I believe is what you're actually trying to accomplish.
SELECT
*
FROM (
SELECT
EmpId,
KpiName,
[Hour],
KpiValue
FROM
Kpis
) SourceTable
PIVOT (
SUM(KpiValue)
FOR KpiName
IN ([Kpi1],[Kpi2])
) PivotTable
Which gives this output. Note the NULLs as opposed to the zeros, correctly showing the lack of data.
EmpId Hour Kpi1 Kpi2
ABC109 8 NULL 34
ABC123 8 122 12
ABC907 8 24 NULL
ABC908 9 3 NULL

Combining the rows column values in ASP.NET

SELECT name,
batchno,
recievedeggs,
settingqnty,
CONVERT(VARCHAR(50), settingdate, 103) AS settingdate,
setteroutput,
( 100 * setteroutput / settingqnty ) AS [SetterHatch%],
hathersettingqty,
hatcheroutput,
culls,
[hatcherhatch%],
( 100 * hatcheroutput / Sum(settingqnty) ) AS [Hatch%],
CONVERT(VARCHAR(50), pulloutdate, 103) AS pulloutdate,
hatcher
FROM (SELECT SH.name,
MS.batchno,
MS.recievedeggs,
MS.quantity AS SettingQnty,
MS.settingdate,
SD.remainingqnty AS SetterOutput,
MH.settingqntity AS HatherSettingQty,
MH.saleablechicks AS HatcherOutput,
MH.culls,
Round(MH.hatchpercent, 2) AS [HatcherHatch%],
MH.pulloutdate,
SH1.name AS Hatcher
FROM k_hm_settergetterallocationdet MS
INNER JOIN k_hm_setterdetails SD
ON MS.sno = SD.id
INNER JOIN k_hm_hatcherdetails HD
ON SD.sno = HD.id
INNER JOIN k_hm_masterhatcherdet MH
ON HD.sno = MH.id
INNER JOIN k_hm_gettersetterdet SH
ON MS.name = SH.sno
INNER JOIN k_hm_gettersetterdet SH1
ON HD.hatchername = SH1.sno
WHERE settingdate BETWEEN #fromdate AND #todate)a
GROUP BY a.settingdate,
a.name,
a.recievedeggs,
a.settingdate,
a.setteroutput,
a.hathersettingqty,
a.hatcheroutput
ORDER BY a.settingdate DESC
using this am getting output like:
S.No. SetterName SettingDate FlockNo Rec.Eggs SettingEggs SetterO/P Setter% HatcherName HatcherQnty PulloutDate HatcherO/P Culls Hatcher% Total Hatch%
1 Setter1 01/06/2014 Batch10 2500 2150 2136 99 Hatcher1 2136 22/06/2014 2115 15 99.02 98
2 Setter1 01/06/2014 Batch10 2500 2355 2341 99 Hatcher1 2341 22/06/2014 2314 21 98.85 98
3 Setter2 01/06/2014 Batch10 2450 2255 2241 99 Hatcher1 2241 22/06/2014 2221 20 99.11 98
but I want output like this:
S.No. SetterName SettingDate FlockNo Rec.Eggs SettingEggs SetterO/P Setter% HatcherName HatcherQnty PulloutDate HatcherO/P Culls Hatcher% Total Hatch%
1 Setter1,Setter3 01/06/2014 Batch10 7450 6760 6781 99 Hatcher1 6781 22/06/2014 6650 15 99.02 98
I have tried in this way but it's not correct.
DECLARE #t VARCHAR(Max)
Select #t = ISNULL(#t + ',' + SetterName, SetterName) from hk
where FlockNo in (select FlockNo from hk)
(select #t as setterName,SettingDate,FlockNo,sum(rec) as rec,SUM(SettingEggs)as se,sum([SetterO/P])assop,([Setter%])asse,HatcherName,
Sum(HatcherQnty)HQ,PulloutDate,sum([HatcherO/P]) as hatchero,min(Culls)as culls,max([Hatcher%])as hat,
(Total)
from hk
group by FlockNo,[Setter%],SettingDate,HatcherName,PulloutDate,Total,FlockNo)
getting result
Setter1,Setter2,Setter3 2014-01-06 Batch10 7450 6760 6718 99 Hatcher1 6718 2014-06-22 6650 15 99 98

Resources