Related
I have difficulties to write a SQL script.
I have a table like this:
And I want to have a result like this:
I used the min and max functions but that doesn't work.
Do you have any idea?
Thank you for your help
MIN() and MAX() do appear to get you what you want. FYI, I have converted your dates to yyyy-MM-dd format.
IF OBJECT_ID('tempdb..#YourTable','U') IS NOT NULL DROP TABLE #YourTable; --SELECT * FROM #YourTable
CREATE TABLE #YourTable (
Business_Key int NOT NULL,
[Name] varchar(10) NOT NULL,
[Attribute] varchar(10) NOT NULL,
ValidFrom date NOT NULL,
ValidTo date NOT NULL,
Primary_Key int NOT NULL,
);
INSERT INTO #YourTable (Business_Key, [Name], Attribute, ValidFrom, ValidTo, Primary_Key)
VALUES (1, 'Toto', 'Child', '2020-01-01', '2020-01-03', 1)
, (1, 'Toto', 'Child', '2020-01-03', '2020-01-10', 2)
, (1, 'Toto', 'Man' , '2020-01-10', '2020-01-15', 3)
, (2, 'Tata', 'Woman', '2020-01-01', '2020-01-15', 4)
, (3, 'Titi', 'Man' , '2020-01-01', '2020-01-15', 5)
, (3, 'Titi', 'Man' , '2020-01-05', '2020-01-17', 6)
SELECT Business_Key
, [Name]
, [Attribute]
, ValidFrom = MIN(ValidFrom)
, ValidTo = MAX(ValidTo)
, Primary_Key = MAX(Primary_Key)
FROM #YourTable yt
GROUP BY Business_Key, [Name], [Attribute]
Returns:
| Business_Key | Name | Attribute | ValidFrom | ValidTo | Primary_Key |
|--------------|------|-----------|------------|------------|-------------|
| 1 | Toto | Child | 2020-01-01 | 2020-01-10 | 2 |
| 1 | Toto | Man | 2020-01-10 | 2020-01-15 | 3 |
| 2 | Tata | Woman | 2020-01-01 | 2020-01-15 | 4 |
| 3 | Titi | Man | 2020-01-01 | 2020-01-17 | 6 |
I am tracking data in my SCD table as shown below image using the SSIS package.
I need to add a new column, the "Column Updated" (as depicted above) which represents what columns were updated between N and N-1 transaction. This can be achieved by Cursor however I am looking for suggestions to do this in an efficient way. Would it be possible to perform within SCD or any other inbuilt SQL server function?
adding script:
Create table SCDtest
(
id int ,
empid int ,
Deptid varchar(10),
Ename varchar(50),
DeptName varchar(50),
city varchar(50),
startdate datetime,
Enddate datetime ,
ColumnUpdated varchar(500)
)
Insert into SCDtest values (1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020','' )
Insert into SCDtest values (2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020','' )
Insert into SCDtest values (3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020','' )
Insert into SCDtest values (4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL ,'' )
Insert into SCDtest values (5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL,'')
Thank you
Honestly I don't really know why you need this functionality as you can very easily just look at the two rows to see any changes, on the off chance that you do actually need to see them. I've never needed a ColumnUpdated type value and I don't think the processing required to generate one and the storage to hold the data is worth having it.
That said, here is one way you can calculate the desired output from your given test data. Ideally you would do this in a more efficient way as part of your ETL process that is updating the rows as they come in rather than all at once. Though this obviously required info about your ETL that you haven't included in your question:
Query
declare #SCDtest table(id int,empid int,Deptid varchar(10),Ename varchar(50),DeptName varchar(50),city varchar(50),startdate datetime,Enddate datetime);
Insert into #SCDtest values(1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020'),(2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020'),(3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020'),(4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL),(5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL);
with l as
(
select *
,lag(id,1) over (partition by empid order by id) as l
from #SCDtest
)
select l.id
,l.empid
,l.Deptid
,l.Ename
,l.DeptName
,l.city
,l.startdate
,l.Enddate
,stuff(concat(case when l.Deptid <> t.Deptid then ', Deptid' end
,case when l.Ename <> t.Ename then ', Ename' end
,case when l.DeptName <> t.DeptName then ', DeptName' end
,case when l.city <> t.city then ', city' end
)
,1,2,''
) as ColumnUpdated
from l
left join #SCDtest as t
on l.l = t.id
order by l.empid
,l.startdate;
Output
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| id | empid | Deptid | Ename | DeptName | city | startdate | Enddate | ColumnUpdated |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| 1 | 1 | D1 | Mike | Account | Atlanta | 2020-07-31 00:00:00.000 | 2020-08-03 00:00:00.000 | NULL |
| 3 | 1 | D1 | Ross | Account | Atlanta | 2020-08-04 00:00:00.000 | 2020-08-07 00:00:00.000 | Ename |
| 5 | 1 | D1 | John | Marketing | Boston | 2020-08-08 00:00:00.000 | NULL | Ename, DeptName, city |
| 2 | 2 | D2 | Roy | IT | New York | 2020-07-31 00:00:00.000 | 2020-08-05 00:00:00.000 | NULL |
| 4 | 2 | D2 | Roy | IT | Los angeles | 2020-08-05 00:00:00.000 | NULL | city |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
I have a table with the purchase registers. There are all the purchase registers of a Pet Shop, since 2010.
I need some help to bring only the last five purchase of each client.
I was trying, but it is not working. It brings me the last 5 registers of all the table, and not of each client.
SELECT TOP (5) [client_name],
[purchase_date],
[item]
FROM [Pet_Shop]
ORDER BY client_name
WHERE client_name in ('John', 'Mary', 'Austin')
I need this kind of return:
client_name | purchase_date | item
___________________________________
John | 2019-09-14 | food
John | 2019-09-13 | ball
John | 2019-09-12 | shampoo
John | 2019-09-11 | cookie
John | 2019-09-11 | food
Mary | 2019-09-14 | collar
Mary | 2019-07-14 | food
Mary | 2019-06-14 | toy
Mary | 2019-06-14 | hamster
Mary | 2019-05-14 | food
Austin | 2019-09-18 | food
Austin | 2019-09-11 | collar
Austin | 2019-09-10 | toy
Austin | 2019-09-09 | catnip
Austin | 2019-09-11 | food
Use ROW_NUMBER():
SELECT *
FROM (
SELECT
client_name,
purchase_date,
item,
ROW_NUMBER() OVER(PARTITION BY client_name ORDER BY purchase_date desc) rn
FROM Pet_Shop
WHERE client_name in ('John', 'Mary', 'Austin')
) x
WHERE rn <= 5
ORDER BY client_name
You can use CROSS APPLY like this:
DECLARE #Registry TABLE (client_name VARCHAR(100), purchase_date DATETIME, item INT)
INSERT INTO #Registry
(client_name, purchase_date, item)
VALUES
('Client1', '1/1/2019', 1),
('Client1', '2/1/2019', 2),
('Client1', '3/1/2019', 3),
('Client1', '4/1/2019', 4),
('Client1', '5/1/2019', 5),
('Client1', '6/1/2019', 6),
('Client1', '7/1/2019', 7),
('Client2', '1/1/2019', 1),
('Client2', '2/1/2019', 2),
('Client2', '3/1/2019', 3),
('Client2', '4/1/2019', 4),
('Client2', '5/1/2019', 5),
('Client2', '6/1/2019', 6),
('Client2', '7/1/2019', 7)
;WITH Clients AS (
SELECT client_name FROM #Registry GROUP BY client_name
)
SELECT C.*, P.purchase_date, P.item
FROM Clients AS C
CROSS APPLY
(
SELECT TOP 5 R.purchase_date, R.item
FROM #Registry R
WHERE R.client_name = C.client_name
ORDER BY R.purchase_date DESC
) P
ORDER BY C.client_name, P.purchase_date, P.item
Here is the result:
client_name purchase_date item
Client1 2019-03-01 00:00:00.000 3
Client1 2019-04-01 00:00:00.000 4
Client1 2019-05-01 00:00:00.000 5
Client1 2019-06-01 00:00:00.000 6
Client1 2019-07-01 00:00:00.000 7
Client2 2019-03-01 00:00:00.000 3
Client2 2019-04-01 00:00:00.000 4
Client2 2019-05-01 00:00:00.000 5
Client2 2019-06-01 00:00:00.000 6
Client2 2019-07-01 00:00:00.000 7
I have two tables Customers and Purchases:
Customers table:
+------------+-----------+----------+
| CustomerID | FirstName | Surname |
+------------+-----------+----------+
| 101 | Jeff | Smith |
| 102 | Alex | Jones |
| 103 | Pam | Clark |
| 104 | Zola | Lona |
| 105 | Simphele | Ndima |
| 106 | Andre | Williams |
| 107 | Wayne | Shelton |
| 108 | Bob | Banard |
| 109 | Ken | Davidson |
| 110 | Sally | Ivan |
+------------+-----------+----------+
Purchases table:
+------------+--------------+------------+-----------+
| PurchaseId | PurchaseDate | CustomerID | ProductID |
+------------+--------------+------------+-----------+
| 1 | 2012-08-15 | 105 | a510 |
| 2 | 2012-08-15 | 102 | a510 |
| 3 | 2012-08-15 | 103 | a506 |
| 4 | 2012-08-16 | 105 | a510 |
| 5 | 2012-08-17 | 106 | a507 |
| 6 | 2012-08-17 | 107 | a509 |
| 7 | 2012-08-18 | 108 | a502 |
| 8 | 2012-08-19 | 108 | a510 |
| 9 | 2012-08-19 | 109 | a502 |
| 10 | 2012-08-20 | 110 | a503 |
| 11 | 2012-08-21 | 101 | a510 |
| 12 | 2012-08-22 | 102 | a507 |
+------------+--------------+------------+-----------+
My question (which I have been struggling with for the last 2 days): create a query that will display all the customers who purchased products after five days or more, since their last purchase.
Desired outputs:
+-----------+------------------+
| Firstname | Daysdifference |
+-----------+------------------+
| Alex | 7 |
+-----------+------------------+
select c.FirstName, t.dif as Daysdifference from customer c
inner join
(
select p1.CustomerID,
datediff(day,p1.PurchaseDate,p2.PurchaseDate) as dif
from purchases p1
inner join purchases p2
on p1.CustomerID=p2.CustomerID
where datediff(day,p1.PurchaseDate,p2.PurchaseDate)>=5
) t
on t.CustomerID= c.CustomerID
Here you go:
DECLARE #Customers TABLE (CustomerID INT, FirstName VARCHAR(30), Surname VARCHAR(30));
DECLARE #Purchases TABLE (PurchaseId INT, PurchaseDate DATE, CustomerID INT, ProductID VARCHAR(10) );
/**/
INSERT INTO #Customers VALUES
(101,'Jeff ' , 'Smith '),
(102,'Alex ' , 'Jones '),
(103,'Pam ' , 'Clark '),
(104,'Zola ' , 'Lona '),
(105,'Simphele' , 'Ndima '),
(106,'Andre ' , 'Williams'),
(107,'Wayne ' , 'Shelton '),
(108,'Bob ' , 'Banard '),
(109,'Ken ' , 'Davidson'),
(110,'Sally ' , 'Ivan ');
INSERT INTO #Purchases VALUES
(1, '2012-08-15' ,105, 'a510'),
(2, '2012-08-15' ,102, 'a510'),
(3, '2012-08-15' ,103, 'a506'),
(4, '2012-08-16' ,105, 'a510'),
(5, '2012-08-17' ,106, 'a507'),
(6, '2012-08-17' ,107, 'a509'),
(7, '2012-08-18' ,108, 'a502'),
(8, '2012-08-19' ,108, 'a510'),
(9, '2012-08-19' ,109, 'a502'),
(10,'2012-08-20' ,110, 'a503'),
(11,'2012-08-21' ,101, 'a510'),
(12,'2012-08-22' ,102, 'a507');
--
WITH CTE AS (
SELECT Pur1.CustomerID, DATEDIFF(DAY, Pur1.PurchaseDate, Pur2.PurchaseDate) Daysdifference
FROM #Purchases Pur1 INNER JOIN #Purchases Pur2 ON Pur1.CustomerID = Pur2.CustomerID
)
SELECT Cus.FirstName, CTE.Daysdifference
FROM #Customers Cus INNER JOIN CTE ON Cus.CustomerID = CTE.CustomerID
WHERE CTE.Daysdifference >= 5;
Result:
+-----------+------------------+
| Firstname | Daysdifference |
+-----------+------------------+
| Alex | 7 |
+-----------+------------------+
Demo
You can solve it like this:
Create a ranking based on date desc and partitioned by customer id
Next check date diff between consecutive ranks to find those customers
Query below
; with cte as
(
select
*,
row_number() over(partition by CustomerID order by PurchaseDate desc) r
from
Purchases
)
select
Name= c.FirstName,
Daysdifference =datediff(d,c1.PurchaseDate, c2.PurchaseDate)
from
Customers c join
cte c1
on c.customerid=c1.customerid
join cte c2
on c1.CustomerID=c2.CustomerId
and c1.r-1=c2.r
and datediff(d,c1.PurchaseDate, c2.PurchaseDate) >=5
See working demo
Since SQL Server 2012 and the addition of the LAG & LEAD functions, there is no reason at all to do a self join for something like this...
Note... Ranking function can be extremely efficient compared to other methods BUT they do need the help of a proper index to perform their best (note the additional POC index in the test script).
CREATE TABLE #Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(30),
Surname VARCHAR(30)
);
CREATE TABLE #Purchases (
PurchaseId INT PRIMARY KEY,
PurchaseDate DATE,
CustomerID INT,
ProductID VARCHAR(10)
);
INSERT INTO #Customers VALUES
(101,'Jeff ' , 'Smith '),
(102,'Alex ' , 'Jones '),
(103,'Pam ' , 'Clark '),
(104,'Zola ' , 'Lona '),
(105,'Simphele' , 'Ndima '),
(106,'Andre ' , 'Williams'),
(107,'Wayne ' , 'Shelton '),
(108,'Bob ' , 'Banard '),
(109,'Ken ' , 'Davidson'),
(110,'Sally ' , 'Ivan ');
INSERT INTO #Purchases VALUES
(1, '2012-08-15' ,105, 'a510'),
(2, '2012-08-15' ,102, 'a510'),
(3, '2012-08-15' ,103, 'a506'),
(4, '2012-08-16' ,105, 'a510'),
(5, '2012-08-17' ,106, 'a507'),
(6, '2012-08-17' ,107, 'a509'),
(7, '2012-08-18' ,108, 'a502'),
(8, '2012-08-19' ,108, 'a510'),
(9, '2012-08-19' ,109, 'a502'),
(10,'2012-08-20' ,110, 'a503'),
(11,'2012-08-21' ,101, 'a510'),
(12,'2012-08-22' ,102, 'a507');
-- add POC index...
CREATE NONCLUSTERED INDEX ix_POC ON #Purchases (CustomerID, PurchaseDate);
--===========================================================
SELECT
c.FirstName,
p2.Daysdifference
FROM
#Customers c
JOIN (
SELECT
p.CustomerID,
Daysdifference = DATEDIFF(DAY, p.PurchaseDate, LEAD(p.PurchaseDate, 1) OVER (PARTITION BY p.CustomerID ORDER BY p.PurchaseDate))
FROM
#Purchases p
) p2
ON c.CustomerID = p2.CustomerID
WHERE
p2.Daysdifference >= 5;
Results...
FirstName Daysdifference
------------------------------ --------------
Alex 7
I have a question in sql server
table name : Emp
Id |Pid |Firstname| LastName | Level
1 |101 | Ram |Kumar | 3
1 |100 | Ravi |Kumar | 2
2 |101 | Jaid |Balu | 10
1 |100 | Hari | Babu | 5
1 |103 | nani | Jai |44
1 |103 | Nani | Balu |10
3 |103 |bani |lalu |20
Here need to retrieve unique records based on id and Pid columns and records which have duplicate records need to skip.
Finally I want output like below
Id |Pid |Firstname| LastName | Level
1 |101 | Ram |Kumar | 3
2 |101 | Jaid |Balu | 10
3 |103 |bani |lalu |20
I found duplicate records based on below query
select id,pid,count(*) from emp group by id,pid having count(*) >=2
this query get duplicated records 2 that records need to skip to retrieve output
please tell me how to write query to achieve this task in sql server.
Since your output is based on unique ID and PID which do not have any duplicate value, You can use COUNT with partition to achieve your desired result.
SQL Fiddle
Sample Data
CREATE TABLE Emp
([Id] int, [Pid] int, [Firstname] varchar(4), [LastName] varchar(5), [Level] int);
INSERT INTO Emp
([Id], [Pid], [Firstname], [LastName], [Level])
VALUES
(1, 101, 'Ram', 'Kumar', 3),
(1, 100, 'Ravi', 'Kumar', 2),
(2, 101, 'Jaid', 'Balu', 10),
(1, 100, 'Hari', 'Babu', 5),
(1, 103, 'nani', 'Jai', 44),
(1, 103, 'Nani', 'Balu', 10),
(3, 103, 'bani', 'lalu', 20);
Query
SELECT *
FROM
(
SELECT *,rn = COUNT(*) OVER(PARTITION BY ID,PID)
FROM Emp
) Emp
WHERE rn = 1
Output
| Id | Pid | Firstname | LastName | Level |
|----|-----|-----------|----------|-------|
| 1 | 101 | Ram | Kumar | 3 |
| 2 | 101 | Jaid | Balu | 10 |
| 3 | 103 | bani | lalu | 20 |