SQL- Finding a gap that is x amount of months with the same foreign key - sql-server

I am editing this to clarify my question.
Let's say I have a table that holds patient information. I need to find new patients for this year, and the date of their prescription first prescription when they were considered new. Anytime there is a six month gap they are considered a new patient.
How do I accomplish this using SQL. I can do this in Java and any other imperative language easily enough, but I am having problems doing this in SQL. I need this script to be run in Crystal by non-SQL users
Table:
Patient ID Prescription Date
-----------------------------------------
1 12/31/16
1 03/13/17
2 10/10/16
2 05/11/17
2 06/11/17
3 01/01/17
3 04/20/17
4 01/31/16
4 01/01/17
4 07/02/17
So Patients 2 and 4 are considered new patients. Patient 4 is considered a new patient twice, so I need dates for each time patient 4 was considered new 1/1/17 and 7/2/17. Patients 1 and 3 are not considered new this year.
So far I have the code below which tells me if they are new this year, but not if they had another six month gap this year.
SELECT DISTINCT
this_year.patient_id
,this_year.date
FROM (SELECT
patient_id
,MIN(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2017-01-01 00:00:00.000' AND '2017-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS this_year
LEFT JOIN (SELECT
patient_id
,MAX(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2016-01-01 00:00:00.000' AND '2016-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS last_year
WHERE DATEDIFF(month, last_year.date, this_year.date) > 6
OR last_year.date IS NULL

Patient 2 in your example does not meet the criteria you specified ... that being said ...
You can try something like this ... untested but should be similar (assuming you can put this in a stored procedure):
WITH ordered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY [Prescription Date]) rn
FROM table1
)
SELECT o1.[PatientID], DATEDIFF(s, o1.[Prescription Date], o2.[Prescription Date]) diff
FROM ordered o1 JOIN ordered o2
ON o1.rn + 1 = o2.rn
WHERE DATEDIFF(m, o1.[Prescription Date], o2.[Prescription Date]) > 6
Replace table1 with the name of your table.

I assume that you mean the patient has not been prescribed in the last 6 months.
SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE())
This gives you the list of users that have been prescribed in the last 6 months. You want the list of users that are not in this list.
SELECT DISTINCT user_id
FROM table_name
WHERE user_id NOT IN (SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE()))
You'll need to amend the field and table names.

Related

Write Query That Consider Date Interval

I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.

Need to generate rows with missing data in a large dataset - SQL

We are comparing values between months over multiple years. As time moves on the number of years and months in the dataset increases. We are only interested in months where there were values for every year, i.e. a full set.
Consider the following example for 1 month (1) over 3 years (1,2,3) and two activities (101, 102)
Dataset:
Activity Month year Count
------- ---- ------ ------
101 1 1 2
101 1 2 3
101 1 3 1
102 1 1 1
102 1 2 1
In the example above only activity 101 will come into consideration as it satisfies the condition that there must be a count for the activity for month 1 IN year 1, 2 and 3.
Activity 102 doesn't qualify for further analysis as it has no record for year 3.
I would like to generate a record with which I can then evaluate this. The record will effectively generate the new record with the missing row (in this case 102, 1, 3 , 0) to complete the dataset
Activity Month year Count
------- ---- ------ ------
102 1 3 0
We find the problem difficult as the data keeps in growing, the number of activities keep expanding and it is a combination of activity, year and month that need to be evaluated.
An elegant solution will be appreciated.
As I mention in my comment, presumably you have both an Activity table and some kind of Calendar table with details of your activities and the years in your system. As such you can therefore do a CROSS JOIN between these 2 objects and then LEFT JOIN to your table to get the data set you want:
--Create sample objects/data
CREATE TABLE dbo.Activity (Activity int); --Obviously your table has more columns
INSERT INTO dbo.Activity (Activity)
VALUES (101),(102);
GO
CREATE TABLE dbo.Calendar (Year int,
Month int);--Likely your table has more columns
INSERT INTO dbo.Calendar (Year, Month)
VALUES(1,1),
(2,1),
(3,1);
GO
CREATE TABLE dbo.YourTable (Activity int,
Year int,
Month int,
[Count] int);
INSERT INTO dbo.YourTable (Activity,Month, Year, [Count])
VALUES(101,1,1,2),
(101,1,2,3),
(101,1,3,1),
(102,1,1,1),
(102,1,2,1);
GO
--Solution
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM dbo.Activity A
CROSS JOIN dbo.Calendar C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed
If you don't have an Activity and Calendar table (I suggest, however, you should), then you can use subqueries with a DISTINCT, but note this will be far from performant with large data sets:
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM (SELECT DISTINCT Activity FROM dbo.YourTable) A
CROSS JOIN (SELECT DISTINCT Year, Month FROM dbo.YourTable) C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed

How to filter out from count distinct query

I am trying to calculate numbers of customers whom are active in the past 3 and 6 months.
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 180) AND InvoiceDate < (GETDATE() - 90)
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 90)
However, based on above query, I'll get count Customers which has been active for both in the last 3 months and the last 6 months, even if there are duplicates like this.
Customer A bought once in past 3 months
Customer A bought once in past 6 months too
How do I filter out the customers, so that if customer A has been active in both past 3 and 6 months, he/she will only be counted in the 'active in past 3 months' query and not in the 'active in past 6 months' too.
I solve this problem this way
Let us consider you have following table. You might have more columns but for the result you want, we only require customer_id and date they bought something on.
CREATE TABLE [dbo].[customer_invoice](
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_id] [int] NULL,
[date] [date] NULL,
CONSTRAINT [PK_customer_invoice] PRIMARY KEY([id]);
I created this sample data on this table
INSERT INTO [dbo].[customer_invoice]
([customer_id]
,[date])
VALUES
(1,convert(date,'2019-12-01')),
(2,convert(date,'2019-11-05')),
(2,convert(date,'2019-8-01')),
(3,convert(date,'2019-7-01')),
(4,convert(date,'2019-4-01'));
Lets not try to jump directly on the final solution directly but take a single leap each time.
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id;
The above query gives you the number of days before each customer was active
customer_id lastActiveDays
1 15
2 41
3 168
4 259
Now We will use this query as subquery and Add a new column ActiveWithinCategory so that in later step we can group our data by the column.
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable;
This query gives you the the following result
customer_id lastActiveDays ActiveWithinCategory
1 15 active within 3 months
2 41 active within 3 months
3 168 active within 6 months
4 259 not active
Now use the above whole thing as subquery and Group the data using ActiveWithinCategory
SELECT ActiveWithinCategory, COUNT(*) AS NumberofCustomers FROM (
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable
) AS FinalResult GROUP BY ActiveWithinCategory;
And Here is your final result
ActiveWithinCategory NumberofEmployee
active within 3 months 2
active within 6 months 1
not active 1
If you want to achieve same thing is MySQL Database
Here is the final Query
SELECT ActiveWithinCategory, count(*) NumberofCustomers FROM(
SELECT MIN(DATEDIFF(curdate(),date)) AS lastActiveBefore,
IF(MIN(DATEDIFF(curdate(),date))<90,
'active within 3 months',
IF(MIN(DATEDIFF(curdate(),date))<180,'active within 6 months','not active')
) ActiveWithinCategory
FROM customer_invoice GROUP BY customer_id
) AS FinalResult GROUP BY ActiveWithinCategory;
I suspect that you want to do conditional aggregation here:
SELECT
CustomerNo,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 90 THEN 1 END) AS cnt_last_3,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 180 AND InvoiceDate < GETDATE() - 90
THEN 1 END) AS cnt_first_3
FROM yourTable
GROUP BY
CustomerNo;
Here cnt_last_3 is the count over the immediate past 3 months, and cnt_first_3 is the count from the 3 month period starting 6 months ago and ending 3 months ago.
If you want the distinct count you may add distinct like this
Select
count( Case when dt between getdate()- 90 and getdate() then id else null end) cnt_3_months
,count(distinct Case when dt between getdate() - 180 and getdate() - 90 then id else null end) cnt_6_months
from a

Is there an optimal way to create logical columns from physical column using SQL query statement?

I'm writing a SQL query using a table. My requirement is that I need to generate two logical columns from one physical column with certain conditions. In SQL how to generate two logical columns in final result set?
I have so far tried using sub-queries to derive those logical columns. But that sub-query returns error when incorporate it as a column in main query.
Overall there are other tables which will be joined using SQL JOIN to derive respective columns.
Columns:
CarrierName NVARCHAR(10)
MonthDate DATETIME
Stage INT
Scenario:
In my SQL Server table there is a column called Stage of type int that contains values like 1, 2, 3, 4.
Now, I have two date criteria to apply on above column to derive two logical columns in final result set.
Criteria #1:
Get carriers from past 12 months and priors to past month end date and value of "CurrentStage" should be less than and derive "PriorStage"
Example:
Current month is: March 2019 (2019-03-25) or any given date
Past latest month end date would be: 2019-02-28
12 months prior to above past latest month would be:
From 2018-02-01 To 2019-01-31
Criteria #2:
Get Carriers from past latest month end date and derive "CurrentStage"
While writing two independent SQL SELECT statements I get my desired results.
My challenge is when I think them to integrate in one select statement.
I get this error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression
Code:
DECLARE #DATE DATETIME
SET #DATE = '2018-08-25';
--QUERY 1 - RECORDS WITH PREVIOUS MONTH END DATE
SELECT
T1.CarrierName AS 'Carrier_Number',
T1.Stage AS 'Monitoring_Stage–Current'
FROM
table1 T1
WHERE
T1.Stage IS NOT NULL AND
CONVERT(DATE, T1.MonthDate) = CONVERT(DATE, DATEADD(D, -(DAY(#DATE)), #DATE))
--QUERY 2 - RECORDS FROM PAST 12 MONTHS PRIOR PREVIOUS MONTH END DATE
SELECT
T2.CarrierName,
T2.Stage AS 'Monitoring_Stage–Prior'
FROM
table2 T2
WHERE
T2.Stage IS NOT NULL AND
CONVERT(DATE, T2.MonthDate) BETWEEN CONVERT(DATE, DATEADD(M, -12, DATEADD(D, -(DAY(#DATE)), #DATE)))
AND CONVERT(DATE, DATEADD(D, -(DAY(#DATE) + (DAY(DATEADD(D, -(DAY(#DATE)), #DATE)))), #DATE))
AND T2.Stage) > (SELECT DISTINCT MAX(m.Stage AS INT))
FROM table1 m
WHERE CONVERT(DATE, m.MonthDate) = CONVERT(DATE, DATEADD(D, -(DAY(#DATE)), #DATE))
AND T2.CarrierName = m.CarrierName)
My final expected result set should contain below columns.
Where CurrentStage value is less than PriorStage value.
Expected Results
CarrierName | CurrentStage | PriorStage
--------------+--------------+-------------
C11122 | 1 | 2
C32233 | 3 | 4
Actual Result
I am looking for alternatives. I.e. CTE, Union, temp table etc.
Something like:
SELECT
CarrierName,
Query 1 Result As 'CurrentStage',
Query 2 Result As 'PrioreStage'
FROM
table1
To improve this post, I am adding my response here. My resolution below for this posted question is still under evaluation hence not posting it as my final answer. But it really brought a light to my effort.
RESOLUTION:
SELECT
DISTINCT M.CarrierName, A.[CurrentStage], B.[PriorStage]
FROM
--QUERY 1 - RECORDS WITH CURRENT MONTH END DATE
(SELECT M.CarrierName, M.CarrierID
, Stage AS 'CurrentStage'
FROM table1 M
WHERE M.Stage IS NOT NULL AND
CONVERT(date, M.MonthDate) = CONVERT(date, DATEADD(D,-(DAY(#DATE)), #DATE))
)
A **inner join**
(
--QUERY 2 - RECORDS FROM PAST 12 MONTHS PRIOR CURRENT MONTH END DATE
SELECT M2.CarrierName, M2.CarrierID
, Stage AS 'PriorStage'
FROM table1 M2
WHERE M2.Stage IS NOT NULL AND
CONVERT(date, M2.MonthDate) BETWEEN CONVERT(date, DATEADD(M, -12, DATEADD(D,-(DAY(#DATE)), #DATE)))
AND CONVERT(date, DATEADD(D,-(DAY(#DATE)+(DAY(DATEADD(D,-(DAY(#DATE)), #DATE)))), #DATE))
AND M2.Stage > (SELECT DISTINCT max(m.Stage)
FROM table1 m
WHERE CONVERT(date, m.MonthDate) = CONVERT(date, DATEADD(D,-(DAY(#DATE)), #DATE)) AND
M2.CarrierName = m.CarrierName
)
) B on b.Carrier_Number = a.Carrier_Number
INNER JOIN table1 M ON A.CarrierID = M.CarrierID AND B.CarrierID = M.CarrierID

GROUP BY DAY, CUMULATIVE SUM

I have a table in MSSQL with the following structure:
PersonId
StartDate
EndDate
I need to be able to show the number of distinct people in the table within a date range or at a given date.
As an example i need to show on a daily basis the totals per day, e.g. if we have 2 entries on the 1st June, 3 on the 2nd June and 1 on the 3rd June the system should show the following result:
1st June: 2
2nd June: 5
3rd June: 6
If however e.g. on of the entries on the 2nd June also has an end date that is 2nd June then the 3rd June result would show just 5.
Would someone be able to assist with this.
Thanks
UPDATE
This is what i have so far which seems to work. Is there a better solution though as my solution only gets me employed figures. I also need unemployed on another column - unemployed would mean either no entry in the table or date not between and no other entry as employed.
CREATE TABLE #Temp(CountTotal int NOT NULL, CountDate datetime NOT NULL);
DECLARE #StartDT DATETIME
SET #StartDT = '2015-01-01 00:00:00'
WHILE #StartDT < '2015-08-31 00:00:00'
BEGIN
INSERT INTO #Temp(CountTotal, CountDate)
SELECT COUNT(DISTINCT PERSON.Id) AS CountTotal, #StartDT AS CountDate FROM PERSON
INNER JOIN DATA_INPUT_CHANGE_LOG ON PERSON.DataInputTypeId = DATA_INPUT_CHANGE_LOG.DataInputTypeId AND PERSON.Id = DATA_INPUT_CHANGE_LOG.DataItemId
LEFT OUTER JOIN PERSON_EMPLOYMENT ON PERSON.Id = PERSON_EMPLOYMENT.PersonId
WHERE PERSON.Id > 0 AND DATA_INPUT_CHANGE_LOG.Hidden = '0' AND DATA_INPUT_CHANGE_LOG.Approved = '1'
AND ((PERSON_EMPLOYMENT.StartDate <= DATEADD(MONTH,1,#StartDT) AND PERSON_EMPLOYMENT.EndDate IS NULL)
OR (#StartDT BETWEEN PERSON_EMPLOYMENT.StartDate AND PERSON_EMPLOYMENT.EndDate) AND PERSON_EMPLOYMENT.EndDate IS NOT NULL)
SET #StartDT = DATEADD(MONTH,1,#StartDT)
END
select * from #Temp
drop TABLE #Temp
You can use the following query. The cte part is to generate a set of serial dates between the start date and end date.
DECLARE #ViewStartDate DATETIME
DECLARE #ViewEndDate DATETIME
SET #ViewStartDate = '2015-01-01 00:00:00.000';
SET #ViewEndDate = '2015-02-25 00:00:00.000';
;WITH Dates([Date])
AS
(
SELECT #ViewStartDate
UNION ALL
SELECT DATEADD(DAY, 1,Date)
FROM Dates
WHERE DATEADD(DAY, 1,Date) <= #ViewEndDate
)
SELECT [Date], COUNT(*)
FROM Dates
LEFT JOIN PersonData ON Dates.Date >= PersonData.StartDate
AND Dates.Date <= PersonData.EndDate
GROUP By [Date]
Replace the PersonData with your table name
If startdate and enddate columns can be null, then you need to add
addditional conditions to the join
It assumes one person has only one record in the same date range
You could do this by creating data where every start date is a +1 event and end date is -1 and then calculate a running total on top of that.
For example if your data is something like this
PersonId StartDate EndDate
1 20150101 20150201
2 20150102 20150115
3 20150101
You first create a data set that looks like this:
EventDate ChangeValue
20150101 +2
20150102 +1
20150115 -1
20150201 -1
And if you use running total, you'll get this:
EventDate Total
2015-01-01 2
2015-01-02 3
2015-01-15 2
2015-02-01 1
You can get it with something like this:
select
p.eventdate,
sum(p.changevalue) over (order by p.eventdate asc) as total
from
(
select startdate as eventdate, sum(1) as changevalue from personnel group by startdate
union all
select enddate, sum(-1) from personnel where enddate is not null group by enddate
) p
order by p.eventdate asc
Having window function with sum() requires SQL Server 2012. If you're using older version, you can check other options for running totals.
My example in SQL Fiddle
If you have dates that don't have any events and you need to show those too, then the best option is probably to create a separate table of dates for the whole range you'll ever need, for example 1.1.2000 - 31.12.2099.
-- Edit --
To get count for a specific day, it's possible use the same logic, but just sum everything up to that day:
declare #eventdate date
set #eventdate = '20150117'
select
sum(p.changevalue)
from
(
select startdate as eventdate, 1 as changevalue from personnel
where startdate <= #eventdate
union all
select enddate, -1 from personnel
where enddate < #eventdate
) p
Hopefully this is ok, can't test since SQL Fiddle seems to be unavailable.

Resources