Select only if more than X occurrences

Select only if more than X occurrences - sql-server

My data is something like this:
Client Number | Order Date | Order Amount | Sequence (created with Row_Number())
I have created a sequence with Row_Number(), so I can see how many orders a client has.
If I use WHERE Sequence > 3, I lose the orders prior to 3. I can't use HAVING because I need to see every orders. How can I select the Client Numbers with more than 3 orders?
I would like to see:
Client Number | Order Date | Order Amount | Sequence
1111 Jan 01 100 1
1111 Jan 02 100 2
1111 Jan 03 100 3
1112 Jan 01 100 1
1112 ... ... ...
1112 Jan 20 100 20
So only those with Sequence above 3, while still keeping the line with sequence 1 and 2.

SELECT *
FROM data
WHERE ClientNumber IN
(
SELECT ClientNumber
FROM data
GROUP BY ClientNumber
HAVING COUNT(1) >= 3
);

create table #test(clientnumber int, orderdate datetime, orderamount int)
insert into #test values
(1110, '01/01/2016', 100),
(1110, '01/02/2016', 100),
(1111, '01/01/2016', 100),
(1111, '01/02/2016', 100),
(1111, '01/03/2016', 100),
(1112, '01/01/2016', 100),
(1112, '01/02/2016', 100),
(1112, '01/03/2016', 100),
(1112, '01/04/2016', 100);
with cte as(
select clientnumber, orderdate, orderamount,
count(*) over(partition by clientnumber ) as ran
from #test)
select * from cte
where ran >= 3

Related

Snowflake cumulative sum for multiple entry in same date for a given partition

I have a table with below data set. I want to get the cumulative sum based on PK1 and PK2 as on TXN_DATE. I have tried with cumulative window frame functions and its giving the expected result. But I want the output to be in desired format which needs to be grouped by TXN_DATE.
SELECT
PK1
,PK2
,TXN_DATE
,QTY
,SUM(QTY) OVER (PARTITION BY PK1,PK2 ORDER BY TXN_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) SUM_QTY
FROM MY_TABLE
ORDER BY TXN_DATE;
Above query is giving the result as below,
I want the result as shown below in either one of the format, Can someone help to get the desired result based on this.
OR

Just get rid of rows between unbounded preceding and current row in your window function. The sum() window function will make the daily total the same for all rows on the same day that way.
with SOURCE_DATA as
(
select COLUMN1::string as PK1
,COLUMN2::string as PK2
,COLUMN3::date as TXN_DATE
,COLUMN4::int as QTY
from (values
('P001', 'XYZ', '2022-11-03', 15),
('P001', 'XYZ', '2022-11-08', -1),
('P001', 'XYZ', '2022-11-12', -4),
('P002', 'ABZ', '2022-11-03', 10),
('P002', 'ABZ', '2022-11-03', 1), -- This was listed as ABC in the photo
('P002', 'ABZ', '2022-11-05', -5),
('P002', 'ABZ', '2022-11-10', -1),
('P002', 'ABZ', '2022-11-10', -1),
('P002', 'ABZ', '2022-11-10', 1)
)
)
select *
,sum(QTY) over (partition by PK1, PK2 order by TXN_DATE) QUANTITY
from SOURCE_DATA
order by PK1, TXN_DATE
;
Output:
PK1
PK2
TXN_DATE
QTY
QUANTITY
P001
XYZ
2022-11-03
15
15
P001
XYZ
2022-11-08
-1
14
P001
XYZ
2022-11-12
-4
10
P002
ABZ
2022-11-03
10
11
P002
ABZ
2022-11-03
1
11
P002
ABZ
2022-11-05
-5
6
P002
ABZ
2022-11-10
-1
5
P002
ABZ
2022-11-10
-1
5
P002
ABZ
2022-11-10
1
5

Splitting data from one record is a specific column T-SQL

I'm working on a old legacy database that got imported into SQL Server 2012 from Oracle. I have the following table called INSOrders which includes a column called OrderID of type varchar(8).
An example of the data inserted is:
A04-05 | B81-02 | C02-01
A01-01 | B95-01 | C99-05
A02-02 | B06-07 | C03-02
A98-06 | B10-01 | C17-01
A78-07 | B02-03 | C15-03
A79-01 | B02-01 | C78-06
First Letter = Ordertype, next 2 digit = Year - and last 2 digit = OrderNum within that Year.
So I split all the data into 3 column : (not stored , just presented)
select
orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy, '01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from
ins.insorders
where
orderid is not null
group by
substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
order by
ordtype
It is looking like this:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
A01-01 | A | 01 | 01
B10-03 | B | 10 | 03
B95-01 | B | 95 | 01
etc....
But now I just want to select the Max for all of the OrderType: show only the max for letter A, Show the max for letter B, etc. What I mean Max, I mean from Letter A I need to show the latest year and the latest ordernumber. so if I have A04-01 and A04-02 Just show A04-02.
I need to modify my query were I can see the following:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
B10-03 | B | 10 | 03
C17-01 | C | 17 | 01
Thank you, I will truly appreciate the help.

You can try the below. Using your original query as a cte and assigning row numbers to each group of order types based on order year and order number. Then get all row number 1's which should be the max for each order type.
This little bit DATEPART(yyyy,('01/01/' + OrderYear)) will make sure we get the correct year so that 95 is 1995 and 10 is 2010 etc.
;WITH cte
AS (
select orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from ins.insorders
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
)
SELECT *
FROM
(SELECT
*
, ROW_NUMBER() OVER (PARTITION BY OrderType ORDER BY DATEPART(yyyy,('01/01/' + OrderYear)) DESC, OrderNum DESC) AS RowNum
FROM cte) t
WHERE t.RowNum = 1

The data is represented poorly and I only have a way to "cheese" it, and we'll need to make a lot of assumptions:
with cte_example
as
( your query )
select OrderID
,OrderType
,OrderYear
,OrderNum
from
(select *, row_number() over(partition by OrderType order by OrderYear DESC) rn
from cte_example
where OrderYear <= right(year(getdate()),2)) t1
where t1.rn = 1
Since you already have a query extracting the information I won't bother changing it. We wrap your query in a CTE, query from it and apply the row_number function to decide whichOrderType has the most recent OrderYear, along with its OrderNum and OrderID
Now the tricky part is that the years are poorly represented (assuming my comment on your original post is true), then using any sort of aggregation for OrderType B will return 95 since it is numerically greatest.
We make the assumption that no order date will be greater than this current year, and anything greater is in the 90s, using this statement: where OrderYear < right(year(getdate()),2). In other words get this year and the two right characters of it. First by retrieving 2017 from getdate and then 17 with the RIGHT function. I'm sure why you can see this is dangerous, because what if your latest date is 1999?
So by filtering them out, we can then see the latest year for each OrderType... hope this helps.
Here is the rextester test I built around to play with your query in case you want to try it.

I think your original query was almost exactly what you needed except you need to use MAX(OrderID) and not group by it.
declare #Something table
(
orderid varchar(6)
)
insert #Something
(
orderid
) values
('A04-05'), ('B81-02'), ('C02-01'),
('A01-01'), ('B95-01'), ('C99-05'),
('A02-02'), ('B06-07'), ('C03-02'),
('A98-06'), ('B10-01'), ('C17-01'),
('A78-07'), ('B02-03'), ('C15-03'),
('A79-01'), ('B02-01'), ('C78-06')
select max(orderid),
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from myTable
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid))
order by ordtype

Getting Running Total of Time column using T-SQL in SQL server

I have a table XYZ with employee login duration details in TIME datatype column.
EmployeeID | DomainID | LoginDuration
----------------------------------------------------------------
1111 12 02:32:55:0000000
1111 4 00:57:17.0000000
1111 12 01:06:25.0000000
1111 11 03:31:23.0000000
2222 11 02:42:17.0000000
2222 4 03:54:52.0000000
2222 10 04:08:29.0000000
Apart from the above columns, I also have LoginTimeStamp and LoginWeek columns, which I am using in a JOIN statement.
I am trying to obtain running totals for the LoginDuration Column as follows:
EmployeeID | DomainID | HoursBefore | LoginDuration | HoursAfter |
---------------------------------------------------------------------------------
1111 12 00:00:00.0000000 02:32:55:0000000 **00:00:00.0000000**
1111 4 02:32:55.0000000 00:57:17.0000000 03:30:12.0000000
1111 12 03:30:12.0000000 01:06:25.0000000 04:36:37.0000000
1111 11 04:36:37.0000000 03:31:23.0000000 08:08:00.0000000
2222 11 00:00:00.0000000 02:42:17.0000000 **00:00:00.0000000**
2222 4 01:32:31.0000000 03:54:52.0000000 04:14:48.0000000
2222 10 04:14:48.0000000 04:08:29.0000000 08:09:40.0000000
HoursBefore is Previous Value of HoursAfter(00:00:00 for first row of each employee)
HoursAfter = HoursBefore+LoginDuration
For this purpose,I wrote the below query, But I am getting an error with the HoursAfter Column. It is not adding up the current value and previous value for each employee.
SELECT
a.EmployeeID,a.LoginDuration,
COALESCE(CAST(
DATEADD(ms,
SUM(DATEDIFF(ms,0,CAST(b.LoginDuration as datetime)))
, 0)
as time)
,'00:00:00') AS HoursBefore,
a.LoginDuration as Hours,
COALESCE(CAST(
DATEADD(ms,
SUM(DATEDIFF(ms,0,CAST(b.LoginDuration as datetime)))
, a.Loginduration)
as time)
,'00:00:00') As HoursAfter
FROM XYZ AS a
LEFT OUTER JOIN XYZ AS b
ON (a.EmployeeID = b.EmployeeID)
AND (a.LoginWeek = b.LoginWeek)
AND (b.LoginTimeStamp < a.LoginTimeStamp)
GROUP BY a.EmployeeID, a.LoginTimeStamp,a.LoginDuration
ORDER BY a.LoginWeek, a.EmployeeID, a.LoginTimeStamp;
I need help with the query such that the HoursAfter column for each employee is appropriate.
Any help would be greatly appreciated.
(This is my first query, reply if you may need any further details.)
Thanks.

Pity SQL Server doesn't support period datatype yet, it would make the math so much simpler.
However, it dos have rather good support for window functions in newer versions, which we can use to solve this:
declare #t table (ID int, EmployeeID int, DomainID int, LoginDuration time)
insert #t
values
(1, 1111, 12, '02:32:55.0000000'),
(2, 1111, 4, '00:57:17.0000000'),
(3, 1111, 12, '01:06:25.0000000'),
(4, 1111, 11, '03:31:23.0000000'),
(5, 2222, 11, '02:42:17.0000000'),
(6, 2222, 4, '03:54:52.0000000'),
(7, 2222, 10, '04:08:29.0000000')
;with x as (
select *, dateadd(second, sum(datediff(second, 0, loginduration)) over (partition by employeeid order by id), 0) sum_duration_sec,
row_number() over (partition by employeeid order by id) rn
from #t
)
select
employeeid,
domainid,
convert(time, isnull(lag(sum_duration_sec) over (partition by employeeid order by id),0)) hoursbefore,
loginduration,
convert(time, case when rn = 1 then 0 else sum_duration_sec end) hoursafter
from x
I introduced the ID column for brevity to establish the sequence, you'd probably want to use the (LoginWeek, LoginTimestamp) to order by.
Also, not sure about the requirement that HoursAfter should be 0 in 1st and 5th row - if not, delete the row_number() thing altogether.

use OUTER APPLY to calculate the Hours After. Hours Before is just Hours After subtracting current duration
SELECT a.EmployeeID, a.DomainID,
HoursBefore = CONVERT(TIME, DATEADD(SECOND, b.after_secs - DATEDIFF(SECOND, 0, a.LoginDuration), 0)),
a.LoginDuration,
HoursAfter = CONVERT(TIME, DATEADD(SECOND, b.after_secs, 0))
FROM XYZ AS a
OUTER APPLY
(
SELECT after_secs = SUM(DATEDIFF(SECOND, 0, x.LoginDuration))
FROM XYZ x
WHERE x.EmployeeID = a.EmployeeID
AND x.LoginWeek = a.LoginWeek
AND x.LoginTimeStamp <= a.LoginTimeStamp
) b

How can I group / window date ordered events delineated by an arbitrary expression?

I would like to group some data together based on dates and some (potentially arbitrary) indicator:
Date | Ind
================
2016-01-02 | 1
2016-01-03 | 5
2016-03-02 | 10
2016-03-05 | 15
2016-05-10 | 6
2016-05-11 | 2
I would like to group together subsequent (date-ordered) rows but breaking the group after Indicator >= 10:
Date | Ind | Group
========================
2016-01-02 | 1 | 1
2016-01-03 | 5 | 1
2016-03-02 | 10 | 1
2016-03-05 | 15 | 2
2016-05-10 | 6 | 3
2016-05-11 | 2 | 3
I did find a promising technique at the end of a blog post: "Use this Neat Window Function Trick to Calculate Time Differences in a Time Series" (the final subsection, "Extra Bonus"), but the important part of the query uses a keyword (FILTER) that doesn't seem to be supported in SQL Server (and a quick Google later and I'm not sure where it is supported!).
I'm still hopeful a technique using a window function might be the answer. I just need a counter that I can add to every row, (like RANK or ROW_NUMBER does) but that only increments when some arbitrary condition evaluates as true. Is there a way to do this in SQL Server?

Here is the solution:
DECLARE #t TABLE ([Date] DATETIME, Ind INT)
INSERT INTO #t
VALUES
('2016-01-02', 1),
('2016-01-03', 5),
('2016-03-02', 10),
('2016-03-05', 15),
('2016-05-10', 6),
('2016-05-11', 2)
SELECT [Date],
Ind,
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT *,
CASE WHEN LAG(ind) OVER(ORDER BY [Date]) >= 10
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t
Just mark row as 1 when previous is greater than 10 else 0. Then a running sum will give you the desired result.

Giving full credit to Giorgi for the idea, but I've modified his answer (both for my benefit and for future readers).
Just change the CASE statement to see if 30 or more days have lapsed since the last record:
DECLARE #t TABLE ([Date] DATETIME)
INSERT INTO #t
VALUES
('2016-01-02'),
('2016-01-03'),
('2016-03-02'),
('2016-03-05'),
('2016-05-10'),
('2016-05-11')
SELECT [Date],
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT [Date],
CASE WHEN DATEADD(d, -30, [Date]) >= LAG([Date]) OVER(ORDER BY [Date])
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t

Find the min and max dates between multiple sets of dates

Given the following set of data, I'm trying to determine how I can select the start and end dates of the combined date ranges, when they intersect with each other.
For instance, for PartNum 115678, I would want my final result set to display the date ranges 2012/01/01 - 2012/01/19 (rows 1, 2 and 4 combined since the date ranges intersect) and 2012/02/01 - 2012/03/28 (row 3 since this ones does not intersect with the range found previously).
For PartNum 213275, I would want to select the only row for that part, 2012/12/01 - 2013/01/01.
Edit:
I'm currently playing around with the following SQL statement, but it's not giving me exactly what I need.
with DistinctRanges as (
select distinct
ha1.PartNum "PartNum",
ha1.StartDt "StartDt",
ha2.EndDt "EndDt"
from dbo.HoldsAll ha1
inner join dbo.HoldsAll ha2
on ha1.PartNum = ha2.PartNum
where
ha1.StartDt <= ha2.EndDt
and ha2.StartDt <= ha1.EndDt
)
select
PartNum,
StartDt,
EndDt
from DistinctRanges
Here are the results of the query shown in the edit:

You're better off having a persisted Calendar table, but if you don't, the CTE below will create it ad-hoc. The TOP(36000) part is enough to give you 10 years worth of dates from the pivot ('20100101') on the same line.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table data (
partnum int,
startdt datetime,
enddt datetime,
age int
);
insert data select
12345, '20120101', '20120116', 15 union all select
12345, '20120115', '20120116', 1 union all select
12345, '20120201', '20120328', 56 union all select
12345, '20120113', '20120119', 6 union all select
88872, '20120201', '20130113', 43;
Query 1:
with Calendar(thedate) as (
select TOP(36600) dateadd(d,row_number() over (order by 1/0),'20100101')
from sys.columns a
cross join sys.columns b
cross join sys.columns c
), tmp as (
select partnum, thedate,
grouper = datediff(d, dense_rank() over (partition by partnum order by thedate), thedate)
from Calendar c
join data d on d.startdt <= c.thedate and c.thedate <= d.enddt
)
select partnum, min(thedate) startdt, max(thedate) enddt
from tmp
group by partnum, grouper
order by partnum, startdt
Results:
| PARTNUM | STARTDT | ENDDT |
------------------------------------------------------------------------------
| 12345 | January, 01 2012 00:00:00+0000 | January, 19 2012 00:00:00+0000 |
| 12345 | February, 01 2012 00:00:00+0000 | March, 28 2012 00:00:00+0000 |
| 88872 | February, 01 2012 00:00:00+0000 | January, 13 2013 00:00:00+0000 |

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Select only if more than X occurrences - sql-server

SELECT * FROM data WHERE ClientNumber IN ( SELECT ClientNumber FROM data GROUP BY ClientNumber HAVING COUNT(1) >= 3 );

Related

Snowflake cumulative sum for multiple entry in same date for a given partition

Splitting data from one record is a specific column T-SQL

Getting Running Total of Time column using T-SQL in SQL server

How can I group / window date ordered events delineated by an arbitrary expression?

Find the min and max dates between multiple sets of dates

Categories

Resources