Ignoring rows within daterange

Ignoring rows within daterange - sql-server

I have the following data:
CREATE TABLE SampleData
(
orderid int,
[name] nvarchar(1),
[date] date
);
INSERT INTO SampleData
VALUES
(1, 'a', '2017-01-01'),
(2, 'a', '2017-01-05'),
(3, 'a', '2017-02-01'),
(4, 'a', '2017-04-01'),
(5, 'a', '2017-10-01'),
(6, 'b', '2017-04-01');
I need to retrieve each new order according to the following rules:
The first date for a name is the 'current order' for that name
Orders with the same name, but less than 3 months difference with the 'current order' is considered the same order and needs to be ignored
3 months or more difference with the 'current' order is considered a new order and is now the 'current order' (in the SampleData orderid 1 and 4 need to be compared instead of 3 and 4, because 3 is not the current order)
If the name and date are the same, then the row with the lowest orderid is the superior order
So with the sample data I need the following result:
id name, date
1 a 2017-01-01
4 a 2017-04-01
5 a 2017-10-01
6 b 2017-04-01
I tried several approaches, but without success. Any idea's on how I can achieve this?

Below is a quick fix solution that can be built upon if your code scales beyond the sample data provided. I will state beforehand that this isn't the prettiest solution but it does return the result set you indicated you were after.
If anything, you may want to consider looking into T-SQL Window Functions as well as Analytic Functions. I will advice that that they don't play well with all datatypes.
My goal with the solution below was to rank the rows while partitioning by name and order by the date field. Thus you have something similar to your order id but the rank is specific to the customer who placed the order.
I'll do my best to answer any questions:
if object_id('tempdb..#tmp_SampleData','u') is not null
drop table #tmp_SampleData
CREATE TABLE #tmp_SampleData
(
orderid int,
[name] nvarchar(1),
[date] date
);
INSERT INTO #tmp_SampleData
VALUES
(1, 'a', '2017-01-01'),
(2, 'a', '2017-01-05'),
(3, 'a', '2017-02-01'),
(4, 'a', '2017-04-01'),
(5, 'a', '2017-10-01'),
(6, 'b', '2017-04-01');
if object_id('tempdb..#tmp_iter','u') is not null
drop table #tmp_iter
select
orderid
,name
,date
,rank() over (partition by name order by date) [Rank]
,lag(orderid,1,0) over (partition by name order by date) [LagRank]
--,rank() over (partition by name order by date desc) [ReverseRank]
into #tmp_Iter
from #tmp_SampleData
if object_id('tempdb..#tmp_final','u') is not null
drop table #tmp_final
select
i.orderid
,i.name
,i.date
,datediff(month,i.date,i2.date) [MonthsPassed]
into #tmp_final
from #tmp_Iter i
left join #tmp_Iter i2
on i.Rank = i2.LagRank
select *
from #tmp_final
where 1=1
and MonthsPassed > 3
or MonthsPassed = 0
or MonthsPassed < 0
or MonthsPassed is null

#SQLUser44, thanks for your input. Unfortunately your code is not working. The result for the table below should be orderid's 1,6,7,8 and 9. Yours results in 1,2,3,5,6,7,8 and 9.
INSERT INTO #tmp_SampleData
VALUES
(1,'a','2017-01-01'),
(2,'a','2017-01-08'),
(3,'a','2017-05-01'),
(4,'a','2017-01-05'),
(5,'a','2017-02-01'),
(6,'b','2017-01-01'),
(7,'b','2017-09-01'),
(8,'c','2017-10-01'),
(9,'a','2017-04-01');
I came up with the following that works, but I think it will lack performance...
if object_id('tempdb..#tmp_SampleData','u') is not null
drop table #tmp_SampleData
CREATE TABLE #tmp_SampleData
(
orderid int,
[name] nvarchar(1),
[date] date
);
INSERT INTO #tmp_SampleData
VALUES
(1,'a','2017-01-01'),
(2,'a','2017-01-08'),
(3,'a','2017-05-01'),
(4,'a','2017-01-05'),
(5,'a','2017-02-01'),
(6,'b','2017-01-01'),
(7,'b','2017-09-01'),
(8,'c','2017-10-01'),
(9,'a','2017-04-01');
DECLARE Test_Cursor CURSOR FOR
SELECT * FROM #tmp_SampleData ORDER BY [name], [date];
OPEN Test_Cursor;
DECLARE #orderid int;
DECLARE #name nvarchar(255);
DECLARE #date date;
FETCH NEXT FROM Test_Cursor INTO #orderid, #name, #date;
DECLARE #current_date date = #date;
DECLARE #current_name nvarchar(255) = #name;
DECLARE #listOfIDs TABLE (orderid int);
INSERT #listOfIDs values(#orderid);
WHILE ##FETCH_STATUS = 0
BEGIN
IF(#name = #current_name AND DATEDIFF(MONTH, #current_date, #date) >= 3)
BEGIN
SET #current_date = #date
INSERT #listOfIDs values(#orderid)
END
IF(#name != #current_name)
BEGIN
SET #current_name = #name
SET #current_date = #date
INSERT #listOfIDs values(#orderid)
END
FETCH NEXT FROM Test_Cursor INTO #orderid, #name, #date;
END;
CLOSE Test_Cursor;
DEALLOCATE Test_Cursor;
SELECT * FROM #tmp_SampleData WHERE orderid IN (SELECT orderid FROM #listOfIDs);
Better performing alternatives are very welcome!

Related

T-SQL - Customer Linking

Please run the below code, these are all the same Customer because 2 of them have the same TaxNumber while another one matches one based on CompanyName. I need to link them all and set the ParentCompanyID based on who was created first. I am struggling to get them linked.
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT * FROM #Temp
Below is the result that I require....
Any help will be appreciated.

Using case expression with first_value can give you the desired results:
SELECT CustomerID, CustomerName, CustomerTaxNumber, CreatedDate,
CASE WHEN CustomerTaxNumber IS NULL THEN
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerName ORDER BY CreatedDate)
ELSE
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerTaxNumber ORDER BY CreatedDate)
END As ParentCompanyID
FROM #Temp

Try this:
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT DS.[CreatedDate] AS [FirstEntry]
,DS.[CustomerID] AS [ParentCompanyID]
,#Temp.*
FROM #Temp
CROSS APPLY
(
SELECT TOP 1 *
FROM #Temp
ORDER BY CreatedDate
) DS
DROP TABLE #Temp
You are condition is pretty simple - get the first record. If you need to group the records in some way, you can add additional filtering in the CROSS APPLY clause.

Concatenating with Cursor

I really want to learn and understand how to concatenate strings with the cursor approach.
Here is my table:
declare #t table (id int, city varchar(15))
insert into #t values
(1, 'Rome')
,(1, 'Dallas')
,(2, 'Berlin')
,(2, 'Rome')
,(2, 'Tokyo')
,(3, 'Miami')
,(3, 'Bergen')
I am trying to create a table that has all cities for each ID within one line sorted alphabetically.
ID City
1 Dallas, Rome
2 Berlin, Rome, Tokyo
3 Bergen, Miami
This is my code so far but it is not working and if somebody could walk me through each step I would be very happy and eager to learn it!
set nocount on
declare #tid int
declare #tcity varchar(15)
declare CityCursor CURSOR FOR
select * from #t
order by id, city
open CityCursor
fetch next from CityCursor into #tid, #tcity
while ( ##FETCH_STATUS = 0)
begin
if #tid = #tid -- my idea add all cities in one line within each id
print cast(#tid as varchar(2)) + ', '+ #tcity
else if #tid <> #tid --when it reaches a new id and we went through all cities it starts over for the next line
fetch next from CityCursor into #tid, #tcity
end
close CityCursor
deallocate CityCursor
select * from CityCursor

First, for future readers: A cursor, as Sean Lange wrote in his comment, is the wrong tool for this job. The correct way to do it is using a subquery with for xml.
However, since you wanted to know how to do it with a cursor, you where actually pretty close. Here is a working example:
set nocount on
declare #prevId int,
#tid int,
#tcity varchar(15)
declare #cursorResult table (id int, city varchar(32))
-- if you are expecting more than two cities for the same id,
-- the city column should be longer
declare CityCursor CURSOR FOR
select * from #t
order by id, city
open CityCursor
fetch next from CityCursor into #tid, #tcity
while ( ##FETCH_STATUS = 0)
begin
if #prevId is null or #prevId != #tid
insert into #cursorResult(id, city) values (#tid, #tcity)
else
update #cursorResult
set city = city +', '+ #tcity
where id = #tid
set #prevId = #tid
fetch next from CityCursor into #tid, #tcity
end
close CityCursor
deallocate CityCursor
select * from #cursorResult
results:
id city
1 Dallas, Rome
2 Berlin, Rome, Tokyo
3 Bergen, Miami
I've used another variable to keep the previous id value, and also inserted the results of the cursor into a table variable.

I have written nested cursor to sync with distinct city id. Although it has performance issue, you can try the following procedure
CREATE PROCEDURE USP_CITY
AS
BEGIN
set nocount on
declare #mastertid int
declare #detailstid int
declare #tcity varchar(MAX)
declare #finalCity varchar(MAX)
SET #finalCity = ''
declare #t table (id int, city varchar(max))
insert into #t values
(1, 'Rome')
,(1, 'Dallas')
,(2, 'Berlin')
,(2, 'Rome')
,(2, 'Tokyo')
,(3, 'Miami')
,(3, 'Bergen')
declare #finaltable table (id int, city varchar(max))
declare MasterCityCursor CURSOR FOR
select distinct id from #t
order by id
open MasterCityCursor
fetch next from MasterCityCursor into #mastertid
while ( ##FETCH_STATUS = 0)
begin
declare DetailsCityCursor CURSOR FOR
SELECT id,city from #t order by id
open DetailsCityCursor
fetch next from DetailsCityCursor into #detailstid,#tcity
while ( ##FETCH_STATUS = 0)
begin
if #mastertid = #detailstid
begin
SET #finalCity = #finalCity + CASE #finalCity WHEN '' THEN +'' ELSE ', ' END + #tcity
end
fetch next from DetailsCityCursor into #detailstid, #tcity
end
insert into #finaltable values(#mastertid,#finalCity)
SET #finalCity = ''
close DetailsCityCursor
deallocate DetailsCityCursor
fetch next from MasterCityCursor into #mastertid
end
close MasterCityCursor
deallocate MasterCityCursor
SELECT * FROM #finaltable
END
If you will face any problem, feel free to write in comment section. Thanks

Using a cursor for this is probably the slowest possible solution. If performance is important then there are three valid approaches. The first approach is FOR XML without special XML character protection.
declare #t table (id int, city varchar(15))
insert into #t values (1, 'Rome'),(1, 'Dallas'),(2, 'Berlin'),(2, 'Rome'),(2, 'Tokyo'),
(3, 'Miami'),(3, 'Bergen');
SELECT
t.id,
city = STUFF((
SELECT ',' + t2.city
FROM #t t2
WHERE t.id = t2.id
FOR XML PATH('')),1,1,'')
FROM #t as t
GROUP BY t.id;
The drawback to this approach is when you add a reserved XML character such as &, <, or >, you will get an XML entity back (e.g. "&amp" for "&"). To handle that you have to modify your query to look like this:
Sample data
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t (id int, words varchar(20))
INSERT #t VALUES (1, 'blah blah'),(1, 'yada yada'),(2, 'PB&J'),(2,' is good');
SELECT
t.id,
city = STUFF((
SELECT ',' + t2.words
FROM #t t2
WHERE t.id = t2.id
FOR XML PATH(''), TYPE).value('.','varchar(1000)'),1,1,'')
FROM #t as t
GROUP BY t.id;
The downside to this approach is that it will be slower. The good news (and another reason this approach is 100 times better than a cursor) is that both of these queries benefit greatly when the optimizer chooses a parallel execution plan.
The best approach is a new fabulous function available in SQL Server 2017, STRING_AGG. STRING_AGG does not have the problem with special XML characters and is, by far the cleanest approach:
SELECT t.id, STRING_AGG(t.words,',') WITHIN GROUP (ORDER BY t.id)
FROM #t as t
GROUP BY t.id;

T- Sql aggregate query

can anyone help me figure out what I am doing wrong with this query.I am trying to filter some records from a table that contains emails sent out to clients with the status of the emails.I need to eliminate all EmailIds that has a status of Sent(1) and Bounced(0). Anything other than these two statuses are considered as Delivered(4). So the output contains only EmailId with a status of Delivered(4) for all those EmailIds that doesnt have statuses of 1 and 0.In the example below,I should see EmailId 4 too with a Status of Delivered
This is my sample set up.Really appreciate any help you guys can provide me with
create table #status
(
Id int,
Name varchar(100)
)
insert into #status (Id, Name)
values (0, 'Bounced'), (1, 'Sent'), (2, 'Clicked'),
(3, 'Opened'), (4, 'Delivered')
create table #email
(
EmailId int ,
Email varchar(100),
StatusId int
)
insert into #email (EmailId, email, StatusId)
values (1, 'rjoseph#gmail.com', 1), (1, 'rjoseph#gmail.com', 0),
(2, 'nathan#comcast.net', 1), (2, 'nathan#comcast.net', 2),
(2, 'nathan#comcast.net', 3), (3, 'nora#comcast.net', 1),
(3, 'nora#comcast.net', 2), (3, 'nora#comcast.net', 3),
(4, 'neha#comcast.net', 1)
select
e.EmailId
into
#temp
from
#email e
inner join #status st
on st.Id = e.StatusId
where
(e.StatusId not in (1,0))
group by
e.EmailId
drop table #temp
drop table #email
drop table #status

This is kind of a kludgy way to get to this (you can do this without the temporary tables, but I'm doing that here to follow your own syntax). The first query grabs the rows which match 1 AND 0. The second query returns the email IDs which do not exist in the first query:
SELECT EmailID
INTO #temp
FROM #email
WHERE StatusID = 0
AND EXISTS (SELECT 1 FROM #email WHERE StatusID = 1)
SELECT DISTINCT e.EmailID
FROM #email AS e LEFT JOIN #temp AS t
ON e.EmailID = t.EmailID
WHERE t.EmailID IS NULL
BTW: The SELECT 1 FROM ... does not have anything to do with the StatusID #1. It may seem confusing because I used SELECT 1, but it could have been SELECT 5 or SELECT 'Z'. It's mostly meaningless.
Here's the same query without the temporary table:
SELECT DISTINCT e.EmailID
FROM #email AS e
WHERE e.EmailID NOT IN (
SELECT EmailID
FROM #email
WHERE StatusID = 0
AND EXISTS (SELECT 1 FROM #email WHERE StatusID = 1)
)

SQLServer : Grouping and Replacing a COLUMN value with the DATA from other table, without UDF

I would like to replace the numbers in #CommentsTable column "Comments" with the equivalent text from #ModTable table, without using UDF in a single SELECT. May with a CTE. Tried STUFF with REPLACE, but no luck.
Any suggestions would be a great help!
Sample:
DECLARE #ModTable TABLE
(
ID INT,
ModName VARCHAR(10),
ModPos VARCHAR(10)
)
DECLARE #CommentsTable TABLE
(
ID INT,
Comments VARCHAR(100)
)
INSERT INTO #CommentsTable
VALUES (1, 'MyFirst 5 Comments with 6'),
(2, 'MySecond comments'),
(3, 'MyThird comments 5')
INSERT INTO #ModTABLE
VALUES (1, '[FIVE]', '5'),
(1, '[SIX]', '6'),
(1, '[ONE]', '1'),
(1, '[TWO]', '2')
SELECT T1.ID, <<REPLACED COMMENTS>>
FROM #CommentsTable T1
GROUP BY T1.ID, T1.Comments
**Expected Result:**
ID Comments
1 MyFirst [FIVE] Comments with [SIX]
2 MySecond comments
3 MyThird comments [FIVE]

Create a cursor, span over the #ModTable and do each replacement a time
DECLARE replcursor FOR SELECT ModPos, ModName FROM #ModTable;
OPEN replcursor;
DECLARE modpos varchar(100) DEFAULT "";
DECLARE modname varchar(100) DEFAULT "";
get_loop: LOOP
FETCH replcursor INTO #modpos, #modname
SELECT T1.ID, REPLACE(T1.Comments, #modpos, #modname)
FROM #CommentsTable T1
GROUP BY T1.ID, T1.Comments
END LOOP get_loop;
Of course, you can store the results in a temp table and get the results altogether in the end of loop.

You can use a while loop to iterate over the records and the mods. I slightly modified your #ModTable to have unique values for ID. If this is not your data structure, then you can use a window function like ROW_NUMBER() to get a unique value over which you can iterate.
Revised script example:
DECLARE #ModTable TABLE
(
ID INT,
ModName VARCHAR(10),
ModPos VARCHAR(10)
)
DECLARE #CommentsTable TABLE
(
ID INT,
Comments VARCHAR(100)
)
INSERT INTO #CommentsTable
VALUES (1, 'MyFirst 5 Comments with 6'),
(2, 'MySecond comments'),
(3, 'MyThird comments 5')
INSERT INTO #ModTABLE
VALUES (1, '[FIVE]', '5'),
(2, '[SIX]', '6'),
(3, '[ONE]', '1'),
(4, '[TWO]', '2')
declare #revisedTable table (id int, comments varchar(100))
declare #modcount int = (select count(*) from #ModTable)
declare #commentcount int = (select count(*) from #CommentsTable)
declare #currentcomment varchar(100) = ''
while #commentcount > 0
begin
set #modcount = (select count(*) from #ModTable)
set #currentcomment = (select Comments from #CommentsTable where ID = #commentcount)
while #modcount > 0
begin
set #currentcomment = REPLACE( #currentcomment,
(SELECT TOP 1 ModPos FROM #ModTable WHERE ID = #modcount),
(SELECT TOP 1 ModName FROM #ModTable WHERE ID = #modcount))
set #modcount = #modcount - 1
end
INSERT INTO #revisedTable (id, comments)
SELECT #commentcount, #currentcomment
set #commentcount = #commentcount - 1
end
SELECT *
FROM #revisedTable
order by id

I think the will work even though I generally avoid recursive queries. It assumes that you have consecutive ids though:
with Comments as
(
select ID, Comments, 0 as ConnectID
from #CommentsTable
union all
select ID, replace(c.Comments, m.ModPos, m.ModName), m.ConnectID
from Comments c inner join #ModTable m on m.ConnectID = c.ConnectID + 1
)
select * from Comments
where ConnectID = (select max(ID) from #ModTable)

=> CLR Function()
As I have lot of records in "CommentsTable" and the "ModTable" would have multiple ModName for each comments, finally decided to go with CLR Function. Thanks all of you for the suggestions and pointers.

Is it possible to make an Expression Column in a view to show row number?

I have an Id column in this view, but it jumps from 40,000 to 7,000,000.
I don't want my crazy stored procedure to loop untill it reaches 7,000,000 so i was wondering if i could create a column that was the row number. It would be an expression of some sort, but I don't know how to make it. Please assist!
Thank you in advance.

You really should do what you're doing with updates, not loops. But if you insist...
declare #ID int
declare #LastID int
select #LastID = 0
while (1 = 1)
begin
select #ID = min(Id)
from [vCategoryClaimsData]
where Id > #LastID
-- if no ID found then we've reached the end of the table
if #ID is null break
-- look up the data for #ID
SELECT #claim_Number = dbo.[vCategoryClaimsData].[Claim No],
...
where Id = #LastID
-- do your processing here
...
-- set #LastID to the ID you just processed
select #LastID = #ID
end
Make sure the Id column is indexed. This will allow skipping the non-sequential Id values.
That being said, it looks like the processing you're doing could be handled with update statements. That would be much more efficient, and eliminate many of the problems others have brought up.

If you can, try and re-write the stored procedure to use sets versus row based processing.
To do what you need, you'll use the ROW_NUMBER function. To do this, I've provided some sample code below.
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.IDRownumbersView') IS NOT NULL
DROP VIEW5 dbo.IDRownumbersView
IF OBJECT_ID('tempdb.dbo.IDRownumbersTable') IS NOT NULL
DROP TABLE dbo.IDRownumbersTable
CREATE TABLE dbo.IDRownumbersTable
(
RowID int PRIMARY KEY CLUSTERED
,CharValue varchar(5)
,DateValue datetime
)
INSERT INTO IDRownumbersTable VALUES (10, 'A', GETDATE())
INSERT INTO IDRownumbersTable VALUES (20, 'B', GETDATE())
INSERT INTO IDRownumbersTable VALUES (30, 'C', GETDATE())
INSERT INTO IDRownumbersTable VALUES (40, 'D', GETDATE())
INSERT INTO IDRownumbersTable VALUES (50, 'E', GETDATE())
INSERT INTO IDRownumbersTable VALUES (100, 'F', GETDATE())
INSERT INTO IDRownumbersTable VALUES (110, 'G', GETDATE())
INSERT INTO IDRownumbersTable VALUES (120, 'H', GETDATE())
GO
CREATE VIEW dbo.IDRownumbersView
AS
SELECT ROW_NUMBER() OVER (ORDER BY RowID ASC) AS RowNumber
,RowID
,CharValue
,DateValue
FROM dbo.IDRownumbersTable
GO
SELECT * FROM dbo.IDRownumbersView

Relational tables have no row numbers. You can project a row number into a result by using the built in ROW_NUMBER() OVER (ORDER BY ...) function.
Your procedure has many, many problems. It uses loop #counter as lookup key (!!!). It assumes key stability between iteration (ie. assumes #counter+1 is the next key, ignoring any concurent insert/delete). It assumes stability inside the loop (no transactions, no locking to ensure the validity of EXISTS).
What you're tryign to do is try to emulate a keyset driven cursor. Just use a keyset cursor.