Select a row of first non-null values - database

I have the following table. How do I select the first non-null value of reviewers and voting column if they have the same product_id? The first here mean the first row sorting by created_at
+------------+-----------+--------+---------------------+
| product_id | reviewers | voting | created_at |
+------------+-----------+--------+---------------------+
| B0021ZFV9M | null | null | 2015-03-20 00:34:09 |
| B0021ZFV9M | 4 | 3 | 2015-03-24 00:34:09 |
| B0021ZFV9M | null | null | 2015-04-13 00:55:51 |
| B0021ZFV9M | 30 | 4 | 2015-04-15 00:44:38 |
| B00JKO4CHO | null | null | 2015-09-17 00:41:40 |
| B00JKO4CHO | null | null | 2015-09-19 00:41:47 |
| B00JKO4CHO | 50 | 1 | 2015-09-21 00:41:31 |
+------------+-----------+--------+---------------------+
Expected
+------------+-----------+--------+---------------------+
| product_id | reviewers | voting | created_at |
+------------+-----------+--------+---------------------+
| B0021ZFV9M | 4 | 3 | 2015-03-20 00:34:09 |
| B0021ZFV9M | 4 | 3 | 2015-03-24 00:34:09 |
| B0021ZFV9M | 30 | 4 | 2015-04-13 00:55:51 |
| B0021ZFV9M | 30 | 4 | 2015-04-15 00:44:38 |
| B00JKO4CHO | 50 | 1 | 2015-09-17 00:41:40 |
| B00JKO4CHO | 50 | 1 | 2015-09-19 00:41:47 |
| B00JKO4CHO | 50 | 1 | 2015-09-21 00:41:31 |
+------------+-----------+--------+---------------------+

Try this:
select
product_id,
case
when reviewers is null then (
select reviewers from test
where product_id = a.product_id
and created_at > a.created_at
and reviewers is not null
limit 1)
else reviewers
end as reviewers,
case
when voting is null then (
select voting from test
where product_id = a.product_id
and created_at > a.created_at
and voting is not null
limit 1)
else voting
end as voting,
created_at
from test a;
Example: http://sqlfiddle.com/#!9/546dff/3
create table test (
product_id varchar(20),
reviewers int,
voting int,
created_at datetime
);
insert into test values
('B0021ZFV9M',null , null ,'2015-03-20 00:34:09')
,('B0021ZFV9M',4 , 3 ,'2015-03-24 00:34:09')
,('B0021ZFV9M',null , null ,'2015-04-13 00:55:51')
,('B0021ZFV9M',30 , 4 ,'2015-04-15 00:44:38')
,('B00JKO4CHO',null , null ,'2015-09-17 00:41:40')
,('B00JKO4CHO',null , null ,'2015-09-19 00:41:47')
,('B00JKO4CHO',50 , 1 ,'2015-09-21 00:41:31');
Result:
| product_id | reviewers | voting | created_at |
|------------|-----------|--------|-----------------------------|
| B0021ZFV9M | 4 | 3 | March, 20 2015 00:34:09 |
| B0021ZFV9M | 4 | 3 | March, 24 2015 00:34:09 |
| B0021ZFV9M | 30 | 4 | April, 13 2015 00:55:51 |
| B0021ZFV9M | 30 | 4 | April, 15 2015 00:44:38 |
| B00JKO4CHO | 50 | 1 | September, 17 2015 00:41:40 |
| B00JKO4CHO | 50 | 1 | September, 19 2015 00:41:47 |
| B00JKO4CHO | 50 | 1 | September, 21 2015 00:41:31 |
EDIT:
To update old data, you could do this:
-- create a duplicate empty table
create table test1 like test;
-- insert good data into this duplicate table
insert into test1
select
product_id,
case
when reviewers is null then (
select reviewers from test
where product_id = a.product_id
and created_at > a.created_at
and reviewers is not null
limit 1)
else reviewers
end as reviewers,
case
when voting is null then (
select voting from test
where product_id = a.product_id
and created_at > a.created_at
and voting is not null
limit 1)
else voting
end as voting,
created_at
from test a;
-- remove data from original table
truncate table test;
-- re-insert good data into original table
insert into test select * from test1;
-- drop the duplicate table
drop table test1;
Make a backup of test (original) table before you try this.

select distinct on (a.product_id, a.created_at)
a.product_id,
coalesce(a.reviewers, b.reviewers) reviewers,
coalesce(a.voting, b.voting) voting,
a.created_at
from a_table a
left join a_table b
on a.product_id = b.product_id
and b.reviewers notnull
and b.created_at > a.created_at
order by 1, 4;
SqlFiddle.
Note: it is assumed that if reviewers is not null then voting is not null too.

Related

How to generate IDs based on column values

I will provide examples and code where I can. Assume everything except [CycleStart] and [CycleEnd] datatypes are Varchar, I'm not too fussed about this at this stage.
Table A consists of the following RAW sample data:
+-------+---------+----------------+------------+------------+
| JobID | JobName | CycleDesc | CycleStart | CycleEnd |
+-------+---------+----------------+------------+------------+
| 10003 | Run1 | January 2019 | 31/12/2018 | 31/12/2018 |
| 10005 | Run2 | December 2018 | 31/12/2017 | 31/11/2018 |
| 10006 | Run3 | March 2019 | 31/12/2018 | 31/02/2019 |
| 10007 | Run4 | September 2019 | 31/12/2018 | 31/09/2019 |
| 10008 | Run5 | November 2019 | 31/12/2018 | 31/10/2019 |
+-------+---------+----------------+------------+------------+
Table B consists of the following sample data and the code used to generate this data is below:
+-------+---------+---------+
| JobID | PeriodID | Entity |
+-------+---------+---------+
| 10003 | 202101 | XYZ1 |
| 10003 | 202112 | XYZ2 |
| 10007 | 202008 | XYZ3 |
| 10007 | 202003 | XYZ4 |
| 10008 | 201904 | XYZ5 |
+-------+----------+--------+
Declare #Counter3 INT
SELECT #Counter3=1
WHILE #Counter3 <= 1000
BEGIN
INSERT INTO [dbo].[TableB]
SELECT
FLOOR(RAND()*(33979-1+1))+1 [JobID]
,CAST(ROUND(((2021 - 2019 -1) * RAND() + 2020), 0) AS VARCHAR) + RIGHT('0'+CAST(FLOOR(RAND()*(12-1+1))+1 AS VARCHAR),2) [PeriodID]
,FLOOR(RAND()*(23396-1+1))+1 [Entity]
The issue lies within Table B column [PeriodID]. This column represents an ID generated from [CycleStart] in Table A e.g. 31/12/2018 = 201812 (YYYYMM).
What I want to show in Table B is a Period ID for each Job ID but show EACH month + 30 years ahead of the [CycleStart] date. Example table of what I am looking to achieve:
+-------+---------+---------+
| JobID | PeriodID | Entity |
+-------+---------+---------+
| 10006 | 201812 | XYZ1 |
| 10006 | 201901 | XYZ2 |
| 10006 | 201902 | XYZ3 |
| 10006 | 201903 | XYZ4 |
| 10006 | 201904 | XYZ5 |
| 10006 | 201905 | XYZ5 |
| 10006 | 201906 | XYZ5 |
| 10006 | 201907 | XYZ5 |
| ... | +30yrs | ... |
| 10006 | 204812 | XYZ5 |
+-------+----------+--------+
How can I achieve this? Currently I am just randomly generating IDs which is not related to the [CycleStart] date and therefore just skewing my data but this is the only way I can think of doing it.
The best way is to create a calendar table / date dimension. You can use this table to solve this issue, and reuse it for other problems later. (Search online for some examples on how to build one).
If you have this table then you only need to join this table and that's it.
e.g.
INSERT INTO TableB ( JobID , PeriodID)
SELECT DISTINCT A.JobID , D.TheYear * 100 + D.TheMonth
FROM tableA A
JOIN myDateTable D
ON D.TheDate BETWEEN CONVERT(date , A.CycleStart , 103) AND DATEADD(YEAR,30, CONVERT(date , A.CycleStart , 103));

T-SQL: Values are grouped by month, if there is no value for a month the month should also appear and display "NULL"

i have a SQL that displays turnover, stock and other values for stores grouped by month. Logically, if there is no value for a month, the month doesn't appear. The target is that the empty month should appear and display "NULL" for the values. The empty months should range from the #FROM to the #TO parameter (201807 to 201907) in this case.
Before:
+-------+--------+----------+----------+-------+
| Store | Month | Incoming | Turnover | Stock |
+-------+--------+----------+----------+-------+
| 123 | 201810 | 5 | 4 | 1 |
| 123 | 201811 | 0 | 1 | 0 |
| 123 | 201901 | 25 | 5 | 20 |
| 123 | 201902 | 5 | 10 | 15 |
| 123 | 201903 | 8 | 9 | 14 |
| 123 | 201904 | 5 | 4 | 15 |
| 123 | 201905 | 10 | 5 | 20 |
+-------+--------+----------+----------+-------+
After:
+-------+--------+----------+----------+-------+
| Store | Month | Incoming | Turnover | Stock |
+-------+--------+----------+----------+-------+
| 123 | 201807 | NULL | NULL | NULL |
| 123 | 201808 | NULL | NULL | NULL |
| 123 | 201809 | NULL | NULL | NULL |
| 123 | 201810 | 5 | 4 | 1 |
| 123 | 201811 | 0 | 1 | 0 |
| 123 | 201812 | NULL | NULL | NULL |
| 123 | 201901 | 25 | 5 | 20 |
| 123 | 201902 | 5 | 10 | 15 |
| 123 | 201903 | 8 | 9 | 14 |
| 123 | 201904 | 5 | 4 | 15 |
| 123 | 201905 | 10 | 5 | 20 |
| 123 | 201906 | NULL | NULL | NULL |
| 123 | 201907 | NULL | NULL | NULL |
+-------+--------+----------+----------+-------+
Code Example: db<>fiddle
I have absolutely no idea how to solve this and will thank you in advance for your help! :)
You can try to use cte recursive make a calendar table, then do outer-join
;WITH CTE AS (
SELECT CAST(CAST(#FROM AS VARCHAR(10)) + '01' AS DATE) fromDt,
CAST(CAST(#TO AS VARCHAR(10)) + '01' AS DATE) toDt,
Store
FROM (SELECT DISTINCT Store FROM #Test) t1
UNION ALL
SELECT DATEADD(MONTH,1,fromDt),toDt,Store
FROM CTE
WHERE DATEADD(MONTH,1,fromDt) <= toDt
)
SELECT FORMAT(fromDt,'yyyyMM') Month,
c.Store,
t.Incoming,
t.Turnover,
t.Stock
FROM CTE c
LEFT JOIN #Test t on
c.fromDt = CAST(CAST(t.Month AS VARCHAR(10)) + '01' AS DATE)
and
c.Store = t.Store
sqlfiddle

Cumulative Count of NULL restarting at NOT NULL

I would like to add a column indicating the number invites a person received before they accepted by incrementally counting the number of null columns before a non-null while partitioning over the PERSON_ID and ordering by the INVITED_DATE.
My table has the following format:
| UNIQUE_ID | PERSON_ID | INVITED_DATE | ACCEPTED_DATE |
| 12345 | 567 | 12-01-18 | NULL |
| 12346 | 567 | 12-02-18 | NULL |
| 12347 | 567 | 12-03-18 | NULL |
| 12348 | 567 | 12-04-18 | 12-04-18 |
| 12349 | 567 | 12-05-18 | NULL |
| 12350 | 568 | 12-01-18 | NULL |
| 12351 | 568 | 12-02-18 | 12-02-18 |
The output should ideally look like the following:
| UNIQUE_ID | PERSON_ID | INVITED_DATE | ACCEPTED_DATE | INVITES_BEFORE_ACCEPT |
| 12345 | 567 | 12-01-18 | NULL | 1 |
| 12346 | 567 | 12-02-18 | NULL | 2 |
| 12347 | 567 | 12-03-18 | NULL | 3 |
| 12348 | 567 | 12-04-18 | 12-04-18 | 0 |
| 12349 | 567 | 12-05-18 | NULL | 1 |
| 12350 | 568 | 12-01-18 | NULL | 1 |
| 12351 | 568 | 12-02-18 | 12-02-18 | 0 |
So far I've tried a number iterations of ROW NUMBER with OVER and PARTITION but I've found it will need to be an OUTER APPLY. The following OUTER APPLY counts over the data but doesn't restart the count with a successful accept.
SELECT t.* , invites.INVITES_BEFORE_ACCEPT
FROM table t
OUTER APPLY (
SELECT COUNT(*) INVITES_BEFORE_ACCEPT
FROM table t2
WHERE t.PERSON_ID = t2.PERSON_ID and t.INVITED_DATE < t2.ACCEPTED_DATE
) invites
One way would be
WITH t
AS (SELECT *,
COUNT(ACCEPTED_DATE)
OVER (
PARTITION BY PERSON_ID
ORDER BY INVITED_DATE) AS Grp
FROM [table])
SELECT *,
SUM(CASE
WHEN ACCEPTED_DATE IS NULL
THEN 1
ELSE 0
END)
OVER (
PARTITION BY PERSON_ID, Grp
ORDER BY INVITED_DATE) AS INVITES_BEFORE_ACCEPT
FROM t
Demo

Group Non-Contiguous Dates By Criteria In Column

I have a table with start and end dates for team consultations with customers.
I need to merge certain consultations based on a number of days specified in another column (sometimes the consultations may overlap, sometimes they are contiguous, sometimes they arent), Team and Type.
Some example data is as follows:
DECLARE #TempTable TABLE([CUSTOMER_ID] INT
,[TEAM] VARCHAR(1)
,[TYPE] VARCHAR(1)
,[START_DATE] DATETIME
,[END_DATE] DATETIME
,[GROUP_DAYS_CRITERIA] INT)
INSERT INTO #TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
,(2,'B','A','2015-05-15','2015-05-28',28)
,(2,'B','A','2015-05-15','2016-05-12',28)
,(2,'B','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2013-05-27','2014-07-23',28)
,(3,'C','A','2015-01-12','2015-05-28',28)
,(3,'B','A','2015-01-12','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-12-17',28)
,(4,'A','B','2013-07-09','2014-04-21',7)
,(4,'A','B','2014-04-29','2014-08-01',7)
Which looks like this:
+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE | END_DATE | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
| 1 | A | A | 07/08/2013 | 31/12/2013 | 28 |
| 2 | B | A | 15/05/2015 | 28/05/2015 | 28 |
| 2 | B | A | 15/05/2015 | 12/05/2016 | 28 |
| 2 | B | A | 28/05/2015 | 28/05/2015 | 28 |
| 3 | C | A | 27/05/2013 | 23/07/2014 | 28 |
| 3 | C | A | 12/01/2015 | 28/05/2015 | 28 |
| 3 | B | A | 12/01/2015 | 28/05/2015 | 28 |
| 3 | C | A | 28/05/2015 | 28/05/2015 | 28 |
| 3 | C | A | 28/05/2015 | 17/12/2015 | 28 |
| 4 | A | B | 09/07/2013 | 21/04/2014 | 7 |
| 4 | A | B | 29/04/2014 | 01/08/2014 | 7 |
+-------------+------+------+------------+------------+---------------------+
My desired output is as follows:
+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE | END_DATE | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
| 1 | A | A | 07/08/2013 | 31/12/2013 | 28 |
| 2 | B | A | 15/05/2015 | 12/05/2016 | 28 |
| 3 | C | A | 27/05/2013 | 23/07/2014 | 28 |
| 3 | C | A | 12/01/2015 | 17/12/2015 | 28 |
| 3 | B | A | 12/01/2015 | 28/05/2015 | 28 |
| 4 | A | B | 09/07/2013 | 21/04/2014 | 7 |
| 4 | A | B | 29/04/2014 | 01/08/2014 | 7 |
+-------------+------+------+------------+------------+---------------------+
I am struggling to do this at all, let alone with any efficiency! Any ideas / code will be greatly received.
Server version is MS SQL Server 2014
Thanks,
Dan
If I am understanding your question correctly, we want to return rows only when a second, third, etc consultation has not occurred within group_days_criteria number of days after the previous consultation end date.
We can get the previous consultation end date and eliminate rows (since we are not concerned with the number of consultations) where a consultation occurred for the same customer by the same team and of the same consultation type within our date range.
DECLARE #TempTable TABLE([CUSTOMER_ID] INT
,[TEAM] VARCHAR(1)
,[TYPE] VARCHAR(1)
,[START_DATE] DATETIME
,[END_DATE] DATETIME
,[GROUP_DAYS_CRITERIA] INT)
INSERT INTO #TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
,(2,'B','A','2015-05-15','2015-05-28',28)
,(2,'B','A','2015-05-15','2016-05-12',28)
,(2,'B','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2013-05-27','2014-07-23',28)
,(3,'C','A','2015-01-12','2015-05-28',28)
,(3,'B','A','2015-01-12','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-12-17',28)
,(4,'A','B','2013-07-09','2014-04-21',7)
,(4,'A','B','2014-04-29','2014-08-01',7)
;with prep as (
select Customer_ID,
Team,
[Type],
[Start_Date],
[End_Date],
Group_Days_Criteria,
ROW_NUMBER() over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as rn, -- earliest start date with latest end date
lag([End_Date] + Group_Days_Criteria, 1, 0) over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as PreviousEndDate -- previous end date +
from #TempTable
)
select p.Customer_Id,
p.[Team],
p.[Type],
p.[Start_Date],
p.[End_Date],
p.Group_Days_Criteria
from prep p
where p.rn = 1
or (p.rn != 1 and p.[Start_date] > p.PreviousEndDate)
order by p.Customer_Id, p.[Team], p.[Start_Date], p.[Type]
This returned the desired result set.

Select a specific line if i have the same information

I have a table with a data as bellow :
+--------+----------+-------+------------+--------------+
| month | code | type | date | PersonID |
+--------+----------+-------+------------+--------------+
| 201501 | 178954 | 3 | 2014-12-3 | 10 |
| 201501 | 178954 | 3 | 2014-12-3 | 10 |
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178956 | 2 | 2014-12-11 | 10 |
| 201501 | 178958 | 1 | 2014-12-10 | 10 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178954 | 1 | 2014-12-11 | 13 |
| 201501 | 178954 | 1 | 2014-12-11 | 13 |
+--------+----------+-------+------------+--------------+
In my first 6 lines i have the same PersonID in the same Month What i want if i have the same personID in the same Month i want to select the person who have the type is 2 with the recent date in my case the output will be like as bellow:
+--------+--------+------+------------+----------+
| month | code | type| date | PersonID |
+--------+--------+------+------------+----------+
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178954 | 2 | 2014-12-11 | 13 |
+--------+--------+------+------------+----------+
Also if they are some duplicate rows i don't want to display it
They are any solution to that ?
Simply use GROUP BY:
https://msdn.microsoft.com/de-de/library/ms177673(v=sql.120).aspx
SELECT mont, code, ... FROM tabelname GROUP BY PersonID, date, ...
Note that you have to specifiy all columns in the group by.
SELECT DISTINCT A.month, A.code, A.type, B.date, B.PersonID FROM YourTable A
INNER JOIN (SELECT PersonID, MAX(date) as date FROM YourTable
GROUP BY PersonID) B
ON (A.PersonID = B.PersonID
AND A.date = B.date)
WHERE A.type = 2 ORDER BY B.date DESC, A.PersonID
Just in case you/others are still wondering.

Resources