Related
I apologise for the title of the question, it isn't very clear, but I can't think of a better way of describing it in words, the data should speak for itself.
I have a table of data where I need to know the count of rows that are the same depending on a value, but also taking into account the sequence they are currently in. The table is much larger than this, but these are the columns which are relevant.
Id | MinCode | MaxCode | ExpectedResult
----------------------------------------------------
1 | 00001.000001 | 00001.000001 | 2
2 | 00001.000001 | 00001.000002 | 2
3 | 00002.00001a | 00002.00001a | 3
4 | 00002.00001a | 00002.00001b | 3
5 | 00002.00001a | 00002.00001c | 3
6 | 00002.000002 | 00002.000002 | 1
7 | 00002.00003a | 00002.00003a | 2
8 | 00002.00003a | 00002.00003b | 2
9 | 00002.000002 | 00002.000004 | 1
10 | 00003.000001 | 00003.000001 | 1
Note: Id is also the order in this example, I just didn't see the point of an extra column with the same data.
I have tried several versions using COUNT, ROW_NUMBER/RANK, PARTITION and GROUP BY without getting the ExpectedResult values. The issue is with Ids 6 and 9 as my ExpectedResult value always combines them to produce a 2 which I partially understand as these functions don't take into account the ordering of the data. I believe I'm close, but my T-SQL is pretty rusty these days!
I know I could get this value with processing this data set through a CURSOR, but I'd like to avoid that.
The key here is to create a sequence id you can use in a window function to get the count. You won't be able to do it in one query because window functions can't be combined, but you can pull it off with a subquery or CTE.
To determine the sequence number for a row, you need to count the number of times the group key has changed in the preceding rows. So to determine the changes, create an inner query that checks if the current group key is different from the previous by using the lag window function. Use a case statement that results in 1 or 0 depending on if the lagged value is different from the current. The outer query then just has to sum up the values for all rows preceding up to the current.
Once you have the sequence number, you can use a count window function to count all the rows with matching numbers.
WITH src AS ( -- cte to mimic table.
SELECT *
FROM (VALUES
(1, N'00001.000001', N'00001.000001', 2),
/* ... test data ... */
(10, N'00003.000001', N'00003.000001', 1)
) [src] ( [Id],[MinCode],[MaxCode],[ExpectedResult] )
)
SELECT src.Id, MinCode, MaxCode, ExpectedResult
, COUNT(1) OVER (PARTITION BY seq.SequenceId) [Result]
FROM src
INNER JOIN (
SELECT x.Id, SUM(x.IsNew) OVER (ORDER BY Id ROWS UNBOUNDED PRECEDING) [SequenceId]
FROM (
SELECT Id, CASE WHEN LAG(MinCode) OVER (ORDER BY Id) <> MinCode THEN 1 ELSE 0 END [IsNew]
FROM src
) x
) seq ON seq.Id = src.Id
ORDER BY Id
Here's another possibility:
/* Testing Data */
DECLARE #Data table (
Id int, MinCode varchar(20), MaxCode varchar(20), ExpectedResult int
);
INSERT INTO #Data VALUES
( 1 , '00001.000001', '00001.000001', 2 ),
( 2 , '00001.000001', '00001.000002', 2 ),
( 3 , '00002.00001a', '00002.00001a', 3 ),
( 4 , '00002.00001a', '00002.00001b', 3 ),
( 5 , '00002.00001a', '00002.00001c', 3 ),
( 6 , '00002.000002', '00002.000002', 1 ),
( 7 , '00002.00003a', '00002.00003a', 2 ),
( 8 , '00002.00003a', '00002.00003b', 2 ),
( 9 , '00002.000002', '00002.000004', 1 ),
( 10, '00003.000001', '00003.000001', 1 );
/* Get count of MinCode rows that are the same, taking into account their sequence */
WITH cte AS (
SELECT
Id,
MinCode,
CASE WHEN
LAG ( MinCode, 1 ) OVER ( ORDER BY Id ) = MinCode
OR
LEAD ( MinCode, 1 ) OVER ( ORDER BY Id ) = MinCode
THEN 1
ELSE 0
END AS SeqMatch
FROM #Data
)
SELECT
Id, MinCode, MaxCode, ExpectedResult,
CASE WHEN MatchCount = 0 THEN 1 ELSE MatchCount END AS DerivedResult
FROM #Data d
OUTER APPLY (
SELECT SUM( SeqMatch ) AS MatchCount FROM cte WHERE cte.MinCode = d.MinCode
) AS x;
Returns
+----+--------------+--------------+----------------+---------------+
| Id | MinCode | MaxCode | ExpectedResult | DerivedResult |
+----+--------------+--------------+----------------+---------------+
| 1 | 00001.000001 | 00001.000001 | 2 | 2 |
| 2 | 00001.000001 | 00001.000002 | 2 | 2 |
| 3 | 00002.00001a | 00002.00001a | 3 | 3 |
| 4 | 00002.00001a | 00002.00001b | 3 | 3 |
| 5 | 00002.00001a | 00002.00001c | 3 | 3 |
| 6 | 00002.000002 | 00002.000002 | 1 | 1 |
| 7 | 00002.00003a | 00002.00003a | 2 | 2 |
| 8 | 00002.00003a | 00002.00003b | 2 | 2 |
| 9 | 00002.000002 | 00002.000004 | 1 | 1 |
| 10 | 00003.000001 | 00003.000001 | 1 | 1 |
+----+--------------+--------------+----------------+---------------+
I understand you want this
SELECT count(*) over (partition by MinCode ) as result
FROM test
order BY id
Gives
create table test (Id int, MinCode VARCHAR(30), MaxCode VARCHAR(30), ExpectedResult INT);
INSERT INTO test VALUES (1 , '00001.000001', '00001.000001', 2);
INSERT INTO test VALUES (2 , '00001.000001', '00001.000002', 2);
INSERT INTO test VALUES (3 , '00002.00001a', '00002.00001a', 3);
INSERT INTO test VALUES (4 , '00002.00001a', '00002.00001b', 3);
INSERT INTO test VALUES (5 , '00002.00001a', '00002.00001c', 3);
INSERT INTO test VALUES (6 , '00002.000002', '00002.000002', 1);
INSERT INTO test VALUES (7 , '00002.00003a', '00002.00003a', 2);
INSERT INTO test VALUES (8 , '00002.00003a', '00002.00003b', 2);
INSERT INTO test VALUES (9 , '00002.000002', '00002.000004', 1);
INSERT INTO test VALUES (10 , '00003.000001', '00003.000001', 1);
SELECT Id,MinCode,MaxCode, ExpectedResult AS You,
(count(*) over (partition by MinCode )) as Me
FROM test
order BY id
Id MinCode MaxCode You Me
1 00001.000001 00001.000001 2 2
2 00001.000001 00001.000002 2 2
3 00002.00001a 00002.00001a 3 3
4 00002.00001a 00002.00001b 3 3
5 00002.00001a 00002.00001c 3 3
6 00002.000002 00002.000002 1 2
7 00002.00003a 00002.00003a 2 2
8 00002.00003a 00002.00003b 2 2
9 00002.000002 00002.000004 1 2
10 00003.000001 00003.000001 1 1
I cannot summarize numbers in the table (SQL-Server) after pivoting and I will be very grateful for your advice.
Better if I explain the problem on the example:
Existing table:
+-------+-----------+-----------+-------------------+
| # | $$$$$ | Fire | Water |
+-------+-----------+-----------+-------------------+
| 1 | 5 | 1 | 5 |
| 1 | 4 | 1 | 5 |
| 1 | 10 | 1 | 5 |
| 2 | 3 | 3 | 8 |
| 2 | 4 | 3 | 8 |
+-------+-----------+-----------+-------------------+
Desired output:
+-------+-----------+-----------+-------------------+
| # | $$$$$ | Fire | Water |
+-------+-----------+-----------+-------------------+
| 1 | 19 | 1 | 5 |
| 2 | 7 | 3 | 8 |
+-------+-----------+-----------+-------------------+
I tend to believe that I already tried all the solutions I found with summarizing and grouping by, but it was not solved, so I rely on you. Thanks in advance. The code I used to create the table:
WITH Enerc AS
(
SELECT
a1.[#],
a1.[$$$$$],
a2.[cause_of_loss]
FROM
data1 AS a1
LEFT JOIN
data2 AS a2 ON a1.[id] = a2.[id]
)
SELECT *
FROM Enerc
PIVOT
(SUM(gross_claim) FOR [cause_of_loss] IN ([Fire], [Water])) AS PivotTable;
No need to pivot. Your desired result should be got by grouping and using SUM:
SELECT
a1.[#],
SUM(a1.[$$$$$]),
a1.[Fire]
a1.[Water]
from data1 as a1
group by a1.[#], a1.[Fire], a1.[Water]
Let me show an example:
DECLARE #Hello TABLE
(
[#] INT,
[$$$$$] INT,
[Fire] INT,
[Water] INT
)
INSERT INTO #Hello
(
#,
[$$$$$],
Fire,
Water
)
VALUES
( 1, -- # - int
5, -- $$$$$ - int
1, -- Fire - int
5 -- Water - int
)
, (1, 4, 1, 5)
, (1, 10, 1, 5)
, (2, 3, 3, 8)
, (2, 4, 3, 8)
SELECT
h.#,
SUM(h.[$$$$$]),
h.Fire,
h.Water
FROM #Hello h
GROUP BY h.#, h.Fire, h.Water
try group by after the pivot.
With Enerc as
(SELECT
a1.[#],
a1.[$$$$$],
a2.[cause_of_loss]
from data1 as a1
left join data2 as a2
on a1.[id] = a2.[id]
)
select *
into tmp
from Enerc
PIVOT
(sum(gross_claim)
FOR [cause_of_loss] in (
[Fire], [Water]))
as PivotTable
select [#], sum([$$$$$])as [$$$$$], Fire, Water
from #tmp
group by [#],Fire, Water
EDIT: in case of permission denied:
With Enerc as
(SELECT
a1.[#],
a1.[$$$$$],
a2.[cause_of_loss]
from data1 as a1
left join data2 as a2
on a1.[id] = a2.[id]
),phase2 as(
select *
from Enerc
PIVOT
(sum(gross_claim)
FOR [cause_of_loss] in (
[Fire], [Water]))
as PivotTable)
select [#], sum([$$$$$])as [$$$$$], Fire, Water
from phase2
group by [#],Fire, Water
My script
SELECT ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id, opt.Description, count(ans.Option_Id) as [Count]
FROM Answers ans
LEFT OUTER JOIN Questions que
ON ans.Questions_Id = que.Id
LEFT OUTER JOIN Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1
and ans.Questions_Id = 1
GROUP By ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id, opt.Description
ORDER BY 2, 5 desc
I am trying to get the top number responses (Description) for each Answer_Numeric. The result at the moment looks like this:
| Questions_Id | Answer_Numeric | Option_Id | Description | Count
-----------------------------------------------------------------------
| 1 | 1 | 27 | Technology | 183
| 1 | 1 | 24 | Personal Items | 1
| 1 | 2 | 28 | Wallet / Purse | 174
| 1 | 2 | 24 | Personal Items | 3
| 1 | 2 | 26 | Spiritual | 1
| 1 | 3 | 24 | Personal Items | 53
| 1 | 3 | 25 | Food / Fluids | 5
| 1 | 3 | 26 | Spiritual | 5
| 1 | 3 | 27 | Technology | 1
| 1 | 3 | 28 | Wallet / Purse | 1
As from the example data from above I need it to look like this:
| Questions_Id | Answer_Numeric | Option_Id | Description | Count
-----------------------------------------------------------------------
| 1 | 1 | 27 | Technology | 183
| 1 | 2 | 28 | Wallet / Purse | 174
| 1 | 3 | 24 | Personal Items | 53
I am pretty sure that I need to have a max or something in my Having clause but everything I have tried has not worked. Would really appreciate any help on this.
You can use ROW_NUMBER:
SELECT Questions_Id, Answer_Numeric, Option_Id, Description, [Count]
FROM (
SELECT ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id,
opt.Description, count(ans.Option_Id) as [Count],
ROW_NUMBER() OVER (PARTITION BY ans.Questions_Id, ans.Answer_Numeric
ORDER BY count(ans.Option_Id) DESC) AS rn
FROM Answers ans
LEFT OUTER JOIN Questions que
ON ans.Questions_Id = que.Id
LEFT OUTER JOIN Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1
and ans.Questions_Id = 1
GROUP By ans.Questions_Id,
ans.Answer_Numeric,
ans.Option_Id,
opt.Description) AS t
WHERE t.rn = 1
ORDER BY 2, 5 desc
Alternatively you can use RANK so as to handle ties, i.e. more than one rows per Questions_Id, Answer_Numeric partition sharing the same maximum Count number.
Use row_number():
SELECT *
FROM (SELECT ans.Questions_Id, ans.Answer_Numeric, ans.Option_Id, opt.Description,
count(*) as cnt,
row_number() over (partition by ans.Questions_Id, ans.Answer_Numeric
order by count(*) desc) as seqnum
FROM Answers ans LEFT OUTER JOIN
Questions que
ON ans.Questions_Id = que.Id LEFT OUTER JOIN
Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1 and ans.Questions_Id = 1
GROUP By ans.Questions_Id, ans.Answer_Numeric, ans.Option_Id, opt.Description
) t
WHERE seqnum = 1
ORDER BY 2, 5 desc;
we can get the same result set in different ways and I have taken sample data set you just merge your joins in this code
declare #Table1 TABLE
(Id int, Answer int, OptionId int, Description varchar(14), Count int)
;
INSERT INTO #Table1
(Id, Answer, OptionId, Description, Count)
VALUES
(1, 1, 27, 'Technology', 183),
(1, 1, 24, 'Personal Items', 1),
(1, 2, 28, 'Wallet / Purse', 174),
(1, 2, 24, 'Personal Items', 3),
(1, 2, 26, 'Spiritual', 1),
(1, 3, 24, 'Personal Items', 53),
(1, 3, 25, 'Food / Fluids', 5),
(1, 3, 26, 'Spiritual', 5),
(1, 3, 27, 'Technology', 1),
(1, 3, 28, 'Wallet / Purse', 1)
;
SELECT tt.Id, tt.Answer, tt.OptionId, tt.Description, tt.Count
FROM #Table1 tt
INNER JOIN
(SELECT OptionId, MAX(Count)OVER(PARTITION BY OptionId ORDER BY OptionId)AS RN
FROM #Table1
GROUP BY OptionId,count) groupedtt
ON
tt.Count = groupedtt.RN
WHERE tt.Count <> 5
GROUP BY tt.Id, tt.Answer, tt.OptionId, tt.Description, tt.Count
OR
select distinct Count, Description , Id , Answer from #Table1 e where 1 =
(select count(distinct Count ) from #Table1 where
Count >= e.Count and (Description = e.Description))
I have a table containing employees id, year id, client id, and the number of sales. For example:
--------------------------------------
id_emp | id_year | sales | client id
--------------------------------------
4 | 1 | 14 | 1
4 | 1 | 10 | 2
4 | 2 | 11 | 1
4 | 2 | 17 | 2
For a employee, I want to obtain rows with the minimum sales per year and the minimum sales of the previous year.
One of the queries I tried is the following:
select distinct
id_emp,
id_year,
MIN(sales) OVER(partition by id_emp, id_year) AS min_sales,
LAG(min(sales), 1) OVER(PARTITION BY id_emp, id_year
ORDER BY id_emp, id_year) AS previous
from facts
where id_emp = 4
group by id_emp, id_year, sales;
I get the result:
-------------------------------------
id_emp | id_year | sales | previous
-------------------------------------
4 | 1 | 10 | (null)
4 | 1 | 10 | 10
4 | 2 | 11 | (null)
but I expect to get:
-------------------------------------
id_emp | id_year | sales | previous
-------------------------------------
4 | 1 | 10 | (null)
4 | 2 | 11 | 10
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE EMPLOYEE_SALES ( id_emp, id_year, sales, client_id ) AS
SELECT 4, 1, 14, 1 FROM DUAL
UNION ALL SELECT 4, 1, 10, 2 FROM DUAL
UNION ALL SELECT 4, 2, 11, 1 FROM DUAL
UNION ALL SELECT 4, 2, 17, 2 FROM DUAL;
Query 1:
SELECT ID_EMP,
ID_YEAR,
SALES AS SALES,
LAG( SALES ) OVER ( PARTITION BY ID_EMP ORDER BY ID_YEAR ) AS PREVIOUS
FROM (
SELECT e.*,
ROW_NUMBER() OVER ( PARTITION BY id_emp, id_year ORDER BY sales ) AS RN
FROM EMPLOYEE_SALES e
)
WHERE rn = 1
Query 2:
SELECT ID_EMP,
ID_YEAR,
MIN( SALES ) AS SALES,
LAG( MIN( SALES ) ) OVER ( PARTITION BY ID_EMP ORDER BY ID_YEAR ) AS PREVIOUS
FROM EMPLOYEE_SALES
GROUP BY ID_EMP, ID_YEAR
Results - Both give the same output:
| ID_EMP | ID_YEAR | SALES | PREVIOUS |
|--------|---------|-------|----------|
| 4 | 1 | 10 | (null) |
| 4 | 2 | 11 | 10 |
You mean like this?
select id_emp, id_year, min(sales) as min_sales,
lag(min(sales)) over (partition by id_emp order by id_year) as prev_year_min_sales
from facts
where id_emp = 4
group by id_emp, id_year;
I believe it is because you are using sales column in your group by statement.
Try to remove it and just use
GROUP BY id_emp,id_year
You could get your desired output using ROW_NUMBER() and LAG() analytic functions.
For example,
Table
SQL> SELECT * FROM t;
ID_EMP ID_YEAR SALES CLIENT_ID
---------- ---------- ---------- ----------
4 1 14 1
4 1 10 2
4 2 11 1
4 2 17 2
Query
SQL> WITH DATA AS
2 (SELECT t.*,
3 row_number() OVER(PARTITION BY id_emp, id_year ORDER BY sales) rn
4 FROM t
5 )
6 SELECT id_emp,
7 id_year ,
8 sales ,
9 lag(sales) over(order by sales) previous
10 FROM DATA
11 WHERE rn =1;
ID_EMP ID_YEAR SALES PREVIOUS
---------- ---------- ---------- ----------
4 1 10
4 2 11 10
Usually I'm decent at set-based tsql problems. But this one is beating me.
I've been working 3 days on converting a while-loop procedure into a setbased one. I've gotten to the point below.......but can't make the final jump.
I have the following rows. MyOrdinal will be "in order" ... and a second column (MyMarker) will alternate between having a value and being null. Whenever this "flip" occurs on MyMarker, I would like to increment a "group by" ordinal counter by one. Whenever the "flip" values are non-null or null, these are grouped together as a set.
I've tried several things, but it was too ugly to post. That and since moving to ORM, I don't spend as much time in the tsql anymore.
declare #Holder table ( MyOrdinal int not null , MyMarker int , MyGroupNumber int )
INSERT INTO #Holder (MyOrdinal, MyMarker)
Select 1 , 1
union all Select 2, 2
union all Select 3, null
union all Select 4, 3
union all Select 5, 4
union all Select 6, 5
union all Select 7, 6
union all Select 8, 7
union all Select 9, 8
union all Select 10, 9
union all Select 11, 10
union all Select 12, 11
union all Select 13, 12
union all Select 14, 13
union all Select 15, 14
union all Select 16, 15
union all Select 17, null
union all Select 18, null
union all Select 19, null
union all Select 20, 16
union all Select 21, 17
union all Select 22, 18
union all Select 23, null
union all Select 24, null
union all Select 25, 19
union all Select 26, 20
union all Select 27, null
union all Select 28, 21
Select * from #Holder
Desired Output
| MyOrdinal | MyMarker | MyGroupNumber |
|-----------|----------|---------------|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | null | 2 |
| 4 | 3 | 3 |
| 5 | 4 | 3 |
| 6 | 5 | 3 |
| 7 | 6 | 3 |
| 8 | 7 | 3 |
| 9 | 8 | 3 |
| 10 | 9 | 3 |
| 11 | 10 | 3 |
| 12 | 11 | 3 |
| 13 | 12 | 3 |
| 14 | 13 | 3 |
| 15 | 14 | 3 |
| 16 | 15 | 3 |
| 17 | null | 4 |
| 18 | null | 4 |
| 19 | null | 4 |
| 20 | 16 | 5 |
| 21 | 17 | 5 |
| 22 | 18 | 5 |
| 23 | null | 6 |
| 24 | null | 6 |
| 25 | 19 | 7 |
| 26 | 20 | 7 |
| 27 | null | 8 |
| 28 | 21 | 9 |
Try this one:
First, this assigns a same ROW_NUMBER for continuous Non-NULL MyMarker. ROW_NUMBER is NULL for NULL MyMarkers. After that, you want to add a ROW_NUMBER for NULL MyMarkers such that the value is between the previous NON-NULL and the next NON-NULL. Then use DENSE_RANK to finally assign MyGroupNumber:
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = ROW_NUMBER() OVER(ORDER BY MyOrdinal) - MyMarker + 1
FROM #Holder
),
CteApply AS(
SELECT
t.MyOrdinal,
t.MyMarker,
MyGroupNumber =
CASE
WHEN RN IS NULL THEN x.NewRN
ELSE RN
END
FROM Cte t
OUTER APPLY(
SELECT TOP 1 RN * 1.1 AS NewRN
FROM Cte
WHERE
t.MyOrdinal > MyOrdinal
AND MyMarker IS NOT NULL
ORDER BY MyOrdinal DESC
)x
)
SELECT
MyOrdinal,
MyMarker,
MyGroupNumber = DENSE_RANK() OVER(ORDER BY MyGroupNumber)
FROM CteApply
For Sql Server 2012:
select *, sum(b) over(order by myordinal)
from(select *,
case when (lag(mymarker) over(order by myordinal) is not null
and mymarker is null) or
(lag(mymarker) over(order by myordinal) is null
and mymarker is not null)
then 1 else 0 end as b
from #Holder) t
First you mark rows with 1 where there is a change from null to not null or from not null to null. Other columns are marked as 0. Then running sum of all rows till current.
Fiddle http://sqlfiddle.com/#!6/9eecb/5015
For Sql Server 2008:
with cte1 as (select *,
case when (select max(enddate) from t ti
where ti.ruleid = t.ruleid and ti.startdate < t.startdate) = startdate
then 0 else 1 end as b
from t),
cte2 as(select *, sum(b) over(partition by ruleid order by startdate) as s
from cte1)
select RuleID,
Name,
min(startdate),
case when count(*) = count(enddate)
then max(enddate) else null end from cte2
group by s, ruleid, name
Fiddle http://sqlfiddle.com/#!6/4191d/6