Related
declare #t table
(
id int,
SomeNumt int
)
insert into #t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23
select * from #t
the above select returns me the following.
id SomeNumt
1 10
2 12
3 3
4 15
5 23
How do I get the following:
id srome CumSrome
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
SQL Fiddle example
Output
| ID | SOMENUMT | SUM |
-----------------------
| 1 | 10 | 10 |
| 2 | 12 | 22 |
| 3 | 3 | 25 |
| 4 | 15 | 40 |
| 5 | 23 | 63 |
Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!
The latest version of SQL Server (2012) permits the following.
SELECT
RowID,
Col1,
SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
or
SELECT
GroupID,
RowID,
Col1,
SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.
Thanks to Peso, who commented on the SQL Team thread referred to in another answer.
For SQL Server 2012 onwards it could be easy:
SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM #t
because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)
Let's first create a table with dummy data:
Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)
Now let's insert some data into the table;
Insert Into CUMULATIVESUM
Select 1, 10 union
Select 2, 2 union
Select 3, 6 union
Select 4, 10
Here I am joining same table (self joining)
Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc
Result:
ID SomeValue SomeValue
-------------------------
1 10 10
2 2 10
2 2 2
3 6 10
3 6 2
3 6 6
4 10 10
4 10 2
4 10 6
4 10 10
Here we go now just sum the Somevalue of t2 and we`ll get the answer:
Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc
For SQL Server 2012 and above (much better performance):
Select
c1.ID, c1.SomeValue,
Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc
Desired result:
ID SomeValue CumlativeSumValue
---------------------------------
1 10 10
2 2 12
3 6 18
4 10 28
Drop Table CumulativeSum
A CTE version, just for fun:
;
WITH abcd
AS ( SELECT id
,SomeNumt
,SomeNumt AS MySum
FROM #t
WHERE id = 1
UNION ALL
SELECT t.id
,t.SomeNumt
,t.SomeNumt + a.MySum AS MySum
FROM #t AS t
JOIN abcd AS a ON a.id = t.id - 1
)
SELECT * FROM abcd
OPTION ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.
Returns:
id SomeNumt MySum
----------- ----------- -----------
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
Late answer but showing one more possibility...
Cumulative Sum generation can be more optimized with the CROSS APPLY logic.
Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...
/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP
SELECT * INTO #TMP
FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
UNION
SELECT 4 AS id
UNION
SELECT 5 AS id
) Tab
/* Using CROSS APPLY
Query cost relative to the batch 17%
*/
SELECT T1.id,
T2.CumSum
FROM #TMP T1
CROSS APPLY (
SELECT SUM(T2.id) AS CumSum
FROM #TMP T2
WHERE T1.id >= T2.id
) T2
/* Using INNER JOIN
Query cost relative to the batch 46%
*/
SELECT T1.id,
SUM(T2.id) CumSum
FROM #TMP T1
INNER JOIN #TMP T2
ON T1.id > = T2.id
GROUP BY T1.id
/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT T1.id,
SUM(T1.id) OVER( PARTITION BY id)
FROM #TMP T1
Output:-
id CumSum
------- -------
1 1
2 3
3 6
4 10
5 15
Select
*,
(Select Sum(SOMENUMT)
From #t S
Where S.id <= M.id)
From #t M
You can use this simple query for progressive calculation :
select
id
,SomeNumt
,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from #t
There is a much faster CTE implementation available in this excellent post:
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx
The problem in this thread can be expressed like this:
DECLARE #RT INT
SELECT #RT = 0
;
WITH abcd
AS ( SELECT TOP 100 percent
id
,SomeNumt
,MySum
order by id
)
update abcd
set #RT = MySum = #RT + SomeNumt
output inserted.*
For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.
SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Once the table is created -
select
A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
from #t A, #t B where A.id >= B.id
group by A.id, A.SomeNumt
order by A.id
The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve.
Thank you so much!
If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.
So this piece of SQL:
SUM( CASE [rmaker] WHEN 'Some Maker' THEN 1 ELSE 0 END)
OVER
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT
Allowed me to get something like this:
User 1 Rank1 MakerA 0
User 1 Rank2 MakerB 0
User 1 Rank3 Some Maker 1
User 1 Rank4 Some Maker 2
User 1 Rank5 MakerC 2
User 1 Rank6 Some Maker 3
User 2 Rank1 MakerA 0
User 2 Rank2 SomeMaker 1
Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row.
Partitioning is by User so when we change user, cumulative count is back to zero.
I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.
Thanks!
Groaker
Above (Pre-SQL12) we see examples like this:-
SELECT
T1.id, SUM(T2.id) AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
T1.id
More efficient...
SELECT
T1.id, SUM(T2.id) + T1.id AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
T1.id
Try this
select
t.id,
t.SomeNumt,
sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from
#t t
group by
t.id,
t.SomeNumt
order by
t.id asc;
Try this:
CREATE TABLE #t(
[name] varchar NULL,
[val] [int] NULL,
[ID] [int] NULL
) ON [PRIMARY]
insert into #t (id,name,val) values
(1,'A',10), (2,'B',20), (3,'C',30)
select t1.id, t1.val, SUM(t2.val) as cumSum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.val order by t1.id
Without using any type of JOIN cumulative salary for a person fetch by using follow query:
SELECT * , (
SELECT SUM( salary )
FROM `abc` AS table1
WHERE table1.ID <= `abc`.ID
AND table1.name = `abc`.Name
) AS cum
FROM `abc`
ORDER BY Name
This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 6 years ago.
I am trying to write a query to pick one entry for each item for each month but the latest in the month from the following table:
Name | Date | Value
a |2015-01-01 | 1
a |2015-01-02 | 2
b |2015-01-03 | 1
b |2015-01-04 | 1
b |2015-01-03 | 3
c |2015-01-02 | 2
c |2015-01-29 | 10
a |2015-02-10 | 2
a |2015-02-20 | 1
c |2015-02-10 | 2
c |2015-02-22 | 23
b |2015-02-25 | 1
b |2015-02-19 | 2
return should be:
a |2015-01-02 | 2
b |2015-01-04 | 1
c |2015-01-29 | 10
a |2015-02-20 | 1
b |2015-02-25 | 1
c |2015-02-22 | 23
I wonder how would this be achieved instead of sending multiple queries to SQL server for each month I would like to load all the values with one query then filter the collection on the memory. Otherwise I would end up writing a query as below:
SELECT Name,Date, Value FROM MyTable mt
INNER JOIN (
select max(Date) as MaxDate
FROM [MyTable] m WHERE YEAR(Date) =YEAR(#date)
AND MONTH(Date)=MONTH(#date)) mx ON t.Date = mx.MaxDate)
And this query needs to be run for each month.
Any better idea to return all entries with a single query?
Thanks,
Try grouping by year and month in the derived table:
SELECT t1.Name, t1.[Date], t1.Value
FROM MyTable t1
INNER JOIN (
SELECT Name, YEAR(Date) AS y, MONTH([Date]) AS m, MAX([Date]) as MaxDate
FROM MyTable
GROUP BY Name, YEAR(Date), MONTH([Date])
) t2 ON t1.Name = t2.Name AND
YEAR(t1.[Date]) = t2.y AND MONTH(t1.[Date]) = t2.m AND
t1.[Date] = t2.MaxDate
SELECT *
FROM (
SELECT NAME, DATE, VALUE,
ROW_NUMBER() OVER (PARTITION BY NAME, YEAR(Date), MONTH(Date)
ORDER BY Date DESC) rn
FROM MyTable) AS t
WHERE t.rn = 1
Assuming that you are using a SQL Server version that supports it, you can use the ROW_NUMBER() windowing function to return a sequence number for each row, then you can subsequently use that to restrict to only the rows that you require.
SELECT [Name],[Date],[Value]
,ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY [Date] DESC) AS [Seq]
FROM myTable
Things to consider:
What happens when there is a tie? ROW_NUMBER will always return a sequence number, but if your data has > 1 row at the same Date value, the order will be arbritrary. To solve this add additional tie-break ORDER BY entries
How do I filter this? Put it into a Common Table Expression, Inline View or Real View
I think you need a correlated query once you have a set of distinct (Name, Month). There are various ways of doing this, one is to use cross apply:
select *
from (select distinct Name, Month(Date) as Month
from theTable) itemMonths
cross apply (select Max(value)
from theTable t
where Month(t.Date) = itemMonths.Month
and t.Name = itemMonths.Name)
You could try the following:
WITH MyTable AS
(SELECT 'a' AS name, GETDATE() AS date, 1 AS value
UNION ALL
SELECT 'a', GETDATE()+1, 2
)
, res AS (
SELECT Name,date,MAX(Date) OVER(PARTITION BY Name, DATEPART(yyyy,date), DATEPART(mm, date)) AS max_date , Value FROM MyTable
)
SELECT name,date,res.value FROM res WHERE date=max_date
You still need a filter though as the Max Over will return all rows.
If you were using Teradata I'd suggest using the Qualify Clause but Itzik hasn't had any luck getting this ported to SQL server!
https://connect.microsoft.com/SQLServer/feedback/details/532474
Use Cross apply
SELECT b.*
FROM mytable mt
CROSS apply (SELECT TOP 1 NAME, date, value
FROM [mytable] m
WHERE m.NAME = mt.NAME
AND Month(m.date) = Month(mt.date)
AND Year(m.date) = Year(mt.date)
ORDER BY m.date DESC) b
Let's say I have this table:
ColA | ColB | SortOrder
------------------------
1 | A | 1
NULL | B | 2
2 | C | 3
NULL | D | 4
3 | E | 5
NULL | F | 6
...
This structure is repeating and will always remain in this order.
My desired output is:
ColA | ColB
-----------
1 | A B
2 | C D
3 | E F
...
How can I achieve this?
Join the table to itself and concatenate the rows.
Select a.ColA, a.ColB + ' ' + b.ColB from MyTable a
inner join MyTable b on a.sortOrder = b.sortOrder-1
WHERE a.ColA is not null
SQL fiddle:
http://sqlfiddle.com/#!6/b01df/5
Also, for the sake of completeness, here it is with no joins using window functions:
select cola, lag + ' ' + colb from (
Select lag(cola,1) over (order by sortOrder asc) cola, a.colB, lag(colb,1) over (order by sortOrder asc) lag from MyTable a
)a where cola is not null
Try the following:
SELECT colA,colAB+' '+colB colB FROM
( SELECT ROW_NUMBER() OVER (ORDER BY SortOrder) idA, colA, colB colAB
FROM tbl WHERE colA > 0 ) ta INNER JOIN
( SELECT ROW_NUMBER() OVER (ORDER BY SortOrder) idB, colB
FROM tbl WHERE colA is NULL ) tb ON idB = idA
http://sqlfiddle.com/#!6/9da9b/2
I used the ROW_NUMBER() function as a safer option since the column sortOrder could theoretically have gaps in it and therefore is not safe for being used as a link column. If sortOrder is strictly without gaps, you can use it of course directly (like Philip Devine suggested).
My datatable:
[a] | [b]
----+----
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
3 | 1
What is the correct select for:
SELECT a FROM table WHERE b = 1 AND b = 2 AND b = 3 // Result = 1
SELECT a FROM table WHERE b = 1 AND b = 2 // Result = 2
EDIT:
Thanks this query resolve my problem:
SELECT a FROM table WHERE b IN (1,2,3) AND a IN (SELECT a FROM table GROUP BY a HAVING count(*) = 3) GROUP BY a HAVING count(*) = 3 // Result = 1
SELECT a FROM table WHERE b IN (1,2) AND a IN (SELECT a FROM table GROUP BY a HAVING count(*) = 2) GROUP BY a HAVING count(*) = 2 // Result = 2
Not exactly clear what you're asking, but I think you're looking for EXISTS: http://www.postgresql.org/docs/9.4/static/functions-subquery.html
Depending on other constraints on your data, you may be able to do:
SELECT a FROM "table" WHERE b IN(1,2,3) GROUP BY a HAVING count(*) = 3
Since OP lacks some info:
select a from (
select a,row_number() over(partition by a) rn from foo
where b in (1,2,3) )t
where rn=(select count(a) from foo where a =1) -- you can use `rn` =3 instead of `select count(a) from foo where a =1`
select a from (
select a,row_number() over(partition by a) rn from foo
where b in (1,2) )t
where rn=(select count(a) from foo where a =2)-- you can use `rn` =2 instead of `select count(a) from foo where a =2`
MSSQL
Table looks like so
ID 1 | 2 | 3 | 4 | 5
AA1 1 | 1 | 1 | 2 | 1
any clues on how I could make a query to return
ID | MaxNo
AA1 | 4
, usign the above table example? I know I could write a case blah when statement, but I have a feeling there's a much simpler way of doing this
You can use UNPIVOT to get these comparable items, correctly1, into the same column, and then use ROW_NUMBER() to find the highest valued row2:
declare #t table (ID char(3) not null,[1] int not null,[2] int not null,
[3] int not null,[4] int not null,[5] int not null)
insert into #t (ID,[1],[2],[3],[4],[5]) values
('AA1',1,1,1,2,1)
;With Unpivoted as (
select *,ROW_NUMBER() OVER (ORDER BY Value desc) rn
from #t t UNPIVOT (Value FOR Col in ([1],[2],[3],[4],[5])) u
)
select * from Unpivoted where rn = 1
Result:
ID Value Col rn
---- ----------- ------------------------- --------------------
AA1 2 4 1
1 If you have data from the same "domain" appearing in multiple columns in the same table (such that it even makes sense to compare such values), it's usually a sign of attribute splitting, where part of your data has, incorrectly, been used to form part of a column name.
2 In your question, you say "per row", and yet you've only given a one row sample. If we assume that ID values are unique for each row, and you want to find the maximum separately for each ID, you'd write the ROW_NUMBER() as ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Value desc) rn, to get (I hope) the result you're looking for.
You can use a cross apply where you do max() over the columns for one row.
select T1.ID,
T2.Value
from YourTable as T1
cross apply
(
select max(T.Value) as Value
from (values (T1.[1]),
(T1.[2]),
(T1.[3]),
(T1.[4]),
(T1.[5])) as T(Value)
) as T2
If you are on SQL Server 2005 you can use union all in the derived table instead of values().
select T1.ID,
T2.Value
from YourTable as T1
cross apply
(
select max(T.Value) as Value
from (select T1.[1] union all
select T1.[2] union all
select T1.[3] union all
select T1.[4] union all
select T1.[5]) as T(Value)
) as T2
SQL Fiddle