I have something like
ID Mobile isOptOut
1 123 1
2 123 0
3 123 0
4 123 1
5 234 1
6 234 0
to have something like partition by mobile and isOptOut
if the isOptOut is equal to 1 start from 0
otherwise start from 6
ID Mobile isOptOut RowNum
1 123 1 0
4 123 1 1
2 123 0 6
3 123 0 7
5 234 1 0
6 234 0 6
select *,
case when isOptOut = 0 then ROW_Number() OVER(
PARTITION BY Mobile ,isOptOut
order by Mobile ,isOptOut
) as [Row Number]
from r
where isOptOut = 1
Thanks so much
You've on the right path with the window ROW_NUMBER function. However,
As you want to number all the rows, you'll need it for all rows (not just isOptOut = 0)
In the window function, I've ordered it by ID so it will always return the same values (if you order by the same fields as the partition, then they could come out in any order)
Once you have the row_numbers, add the modifier e.g., if isOptOut = 0, add 5 to the row number; or if it's 1, subtract 1
; WITH src AS
(select *,
ROW_Number() OVER(
PARTITION BY Mobile, isOptOut
ORDER BY ID -- Note I changed this to 'ID' for ordering
) as [rn]
FROM r
)
SELECT ID, Mobile, isOptOut,
CASE isOptOut
WHEN 0 THEN rn + 5
WHEN 1 THEN rn - 1
ELSE NULL
END AS RowNum
FROM src
SELECT ID,Mobile,isOptOut, IIF(isOptOut=1,ROWNUMBER-1,ROWNUMBER+5),ROWNUMBER FROM (
SELECT *, (ROW_Number() OVER(
PARTITION BY Mobile,isOptOut ORDER BY ID, Mobile,isOptOut )) ROWNUMBER
FROM #table) as T ORDER BY ID
I think subquery and IIF suit you to reach your target. I created a window function and in the main query the condition.
Related
I have a Table with 10 records, I have a column (name:RandomNumber) ,that its data type is bit .
now I want to insert data in to this column randomly in such a way that 80 percent of record (8 record) get 0 randomly and 20 percent (2 record) get 1.
For Example Like this:
Id
RandomNumber
1
0
2
0
3
0
4
1
5
0
6
0
7
0
8
1
9
0
10
0
One way is use ORDER BY NEWID() to assign 1 to two rows (20%) and assign 0 to others (remaining 80%) by excluding those assigned 1.
CREATE TABLE dbo.Example(
Id int NOT NULL CONSTRAINT PK_Test PRIMARY KEY
);
INSERT INTO dbo.Example VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
WITH ones AS (
SELECT TOP (2) Id, 1 AS RandomNumber
FROM dbo.Example
ORDER BY NEWID()
)
SELECT Id, 0 AS RandomNumber
FROM dbo.Example
WHERE Id NOT IN(SELECT Id FROM ones)
UNION ALL
SELECT Id, 1 AS RandomNumber
FROM ones
ORDER BY Id;
Alternatively, use ROW_NUMBER() OVER(ORDER BY NEWID()) and a CASE expression:
WITH example AS (
SELECT Id, ROW_NUMBER() OVER(ORDER BY NEWID()) AS rownum
FROM dbo.Example
)
SELECT Id, CASE WHEN rownum <= 2 THEN 1 ELSE 0 END AS RandomNumber
FROM example
ORDER BY Id;
I have below sample input table. In real it has lots of records.
Input:
ID
Classification
123
1
123
2
123
3
123
4
657
1
657
3
657
4
For a 'ID', I want it's records should have 'Classification' column contains all the values 1, 2, 3 and 4. If any of these values are not present then that ID's records should be considered as an exception. The output should be as below.
ID
Classification
Flag
123
1
0
123
2
0
123
3
0
123
4
0
657
1
1
657
3
1
657
4
1
Can someone please help me with how can this can be achieved in sql server.
Thanks.
There are a couple of options here, which is more performant is up to you to test, not me (especially when I don't know what indexes you have). One uses conditional aggregation, to check that all the values are there, and the other uses a subquery and counts the DISTINCT values (as I don't know if there could be duplicate classifications):
SELECT *
INTO dbo.YourTable
FROM (VALUES(123,1),
(123,2),
(123,3),
(123,4),
(657,1),
(657,3),
(657,4))V(ID,Classification);
GO
CREATE CLUSTERED INDEX CI_YourIndex ON dbo.YourTable (ID,Classification);
GO
SELECT ID,
Classification,
CASE WHEN COUNT(CASE YT.Classification WHEN 1 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 2 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 3 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 4 THEN 1 END) OVER (PARTITION BY ID) > 0 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
SELECT ID,
Classification,
CASE (SELECT COUNT(DISTINCT sq.Classification)
FROM dbo.YourTable sq
WHERE sq.ID = YT.ID
AND sq.Classification IN (1,2,3,4)) WHEN 4 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
DROP TABLE dbo.YourTable;
I have a data set produced from a UNION query that aggregates data from 2 sources.
I want to select that data based on whether or not data was found in only of those sources,or both.
The data relevant parts of the set looks like this, there are a number of other columns:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
1
135
1
5
1
135
2
6
1
135
3
7
2
111
1
8
2
135
1
The [preference] column combined with the [group] column is what I'm trying to filter on, I want to return all the rows that have the same [preference] as the MIN([preference]) for each [group]
The desired output given the data above would be rows 1 -> 6
The [preference] column indicates the original source of the data in the UNION query so a legitimate data set could look like:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
2
111
1
5
2
135
1
In which case the desired output would be rows 1,2,3, & 5
What I can't work out is how to do (not real code):
SELECT * WHERE [preference] = MIN([preference]) PARTITION BY [group]
One way to do this is using RANK:
SELECT row
, preference
, [group]
, position
FROM (
SELECT row
, preference
, [group]
, position
, RANK() OVER (PARTITION BY [group] ORDER BY preference) AS seq
FROM t) t2
WHERE seq = 1
Demo here
Should by doable via simple inner join:
SELECT t1.*
FROM t AS t1
INNER JOIN (SELECT [group], MIN(preference) AS preference
FROM t
GROUP BY [group]
) t2 ON t1.[group] = t2.[group]
AND t1.preference = t2.preference
The new_commsstream column below calculates if the previous row's date, partitioned by persondid and ordered by a few other columns including the date in a subquery, is greater than 90 days and returns a 1 if it is and a 0 otherwise:
create view Motability_Dataset_Staging_cmp as
select
mdsc.PersonID,
mdsc.AddressID,
mdsc.Email,
mdsc.Reportdate_month,
mdsc.Channel,
mdsc.CommsMedium,
mdsc.Campaign_Name,
mdsc.Category,
mdsc.MRM_Campaign_code,
mdsc.Action_id,
mdsc.NumSents,
mdsc.ReportDate,
isnull(cmp.ppersonid,mdsc.PersonID) as Prev_PersonID,
isnull(cmp.paddressid,mdsc.AddressID) as Prev_AddressID,
isnull(cmp.pmrmcampaigncode,mdsc.MRM_Campaign_code) as Prev_MRMCampaignCode,
isnull(cmp.pactionid,mdsc.Action_id) as Prev_ActionID,
isnull(cmp.preportdate,mdsc.ReportDate) as Prev_ReportDate,
isnull(cmp.commsdaysinterval,0) as Prev_CommsDays,
isnull(cmp.newcommsstream,0) as New_CommsStream
from Motability_Dataset_Staging as mdsc
left join
(select
cmp.row +1 as row,pcmp.row as prow,
cmp.personid as personid,pcmp.personid as ppersonid,
cmp.addressid as addressid,pcmp.addressid as paddressid,
cmp.MRM_Campaign_code as mrmcampaigncode,pcmp.MRM_Campaign_code as pmrmcampaigncode,
cmp.Action_id as actionid,pcmp.Action_id as pactionid,
cmp.reportdate as reportdate,pcmp.reportdate as preportdate,
datediff(day,cmp.ReportDate,pcmp.ReportDate) as commsdaysinterval,
case when datediff(day,cmp.ReportDate,pcmp.ReportDate) <-90 then 1 else 0 end as newcommsstream
from
(select row_number() over(partition by personid order by personid,addressid,reportdate,mrm_campaign_code,action_id)-1 as row,personid,addressid,MRM_Campaign_code,action_id,reportdate from Motability_Dataset_Staging) cmp
inner join (select row_number() over(partition by personid order by personid,addressid,reportdate,mrm_campaign_code,action_id) as row,personid,addressid,MRM_Campaign_code,action_id,reportdate from Motability_Dataset_Staging) pcmp on cmp.row = pcmp.row and cmp.personid=pcmp.personid
) cmp
on mdsc.PersonID = cmp.personid and mdsc.AddressID = cmp.addressid and mdsc.MRM_Campaign_code=cmp.mrmcampaigncode
I'm struggling to then partition by person id and new_commsstream so every time there's a 1 within the same personid it adds a new row number otherwise returns a 1:
personid new_commsstream row
1 0 1
1 0 1
1 0 1
1 1 2
1 0 2
2 0 1
3 0 1
4 0 1
5 0 1
5 1 2
5 1 3
Any ideas how to achieve this?
Thanks.
I'm not sure if it helps but you do not need to SELECT data for ROW_NUMBER() twice.
You can just place it into a SQL Server CTE exression as follows
Then you can refer to it twice
;with cmp as (
select
row_number() over(partition by personid order by addressid,reportdate,mrm_campaign_code,action_id) as row,
personid,
addressid,
MRM_Campaign_code,
action_id,
reportdate
from Motability_Dataset_Staging
), cmp2 as (
select
cmp2.*, -- previous values
cmp.* --
from cmp
left join cmp as cmp2 -- previous
cmp.row = cmp2.row + 1
)
select
mdsc.PersonID,
mdsc.AddressID,
mdsc.Email,
mdsc.Reportdate_month,
mdsc.Channel,
mdsc.CommsMedium,
mdsc.Campaign_Name,
mdsc.Category,
mdsc.MRM_Campaign_code,
mdsc.Action_id,
mdsc.NumSents,
mdsc.ReportDate,
isnull(cmp.ppersonid,mdsc.PersonID) as Prev_PersonID,
isnull(cmp.paddressid,mdsc.AddressID) as Prev_AddressID,
isnull(cmp.pmrmcampaigncode,mdsc.MRM_Campaign_code) as Prev_MRMCampaignCode,
isnull(cmp.pactionid,mdsc.Action_id) as Prev_ActionID,
isnull(cmp.preportdate,mdsc.ReportDate) as Prev_ReportDate,
isnull(cmp.commsdaysinterval,0) as Prev_CommsDays,
isnull(cmp.newcommsstream,0) as New_CommsStream
from Motability_Dataset_Staging as mdsc
inner join cmp on ......
I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
select id, rid, min(modified), max(active), did from foo group by rid, did order by id;
You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.