I am trying to clean up a not so useful history table by changing it's format.
For the usage of the history table it is relevant between which time a row was valid.
The current situation:
Unit | Value | HistoryOn |
----------------------------------------
1 | 123 | 2013-01-05 14:16:00
1 | 234 | 2013-01-07 12:12:00
2 | 325 | 2013-01-04 14:12:00
1 | 657 | 2013-02-04 17:11:00
3 | 132 | 2013-04-02 13:00:00
The problem that arises here is that as this table grows it will become increasingly resource hungry when I want to know what status all of my containers had during a certain period. (say I want to know the value for all units on a specific date)
My solution is to create a table in this format:
Unit | value | HistoryStart | HistoryEnd |
---------------------------------------------------------------------
1 | 123 | 2013-01-05 14:16:00 | 2013-01-07 12:11:59
1 | 234 | 2013-01-07 12:12:00 | 2013-02-04 17:10:59
1 | 657 | 2013-02-04 17:11:00 | NULL
2 | 325 | 2013-01-04 14:12:00 | NULL
3 | 132 | 2013-04-02 13:00:00 | NULL
Note that the NULL value in HistoryEnd here indicates that the row is still representative of the current status.
I have tried to make use of a left join on the table itself using the HistoryOn field. This had the unfortunate side effect of cascading in an undesired manner.
SQL Query used:
SELECT *
FROM webhistory.Units u1 LEFT JOIN webhistory.Units u2 on u1.Unit = u2.Unit
AND u1.HistoryOn < u2.HistoryOn
WHERE u1.Units = 1
The result of the query is as follows:
Unit | Value | HistoryOn | Unit | Value | HistoryOn |
-------------------------------------------------------------------------------------
1 | 657 | 2013-02-04 17:11:00 | NULL | NULL | NULL
1 | 234 | 2013-01-07 12:12:00 | 1 | 657 | 2013-02-04 17:11:00
1 | 123 | 2013-01-05 14:16:00 | 1 | 657 | 2013-02-04 17:11:00
1 | 123 | 2013-01-05 14:16:00 | 1 | 234 | 2013-01-07 12:12:00
This effect is incremental because each entry will join on all the entries that are newer than itself instead of only the first entry that comes after it.
Sadly right as of yet I am unable to come up with a good query to solve this and would like insights or suggestions that could help me solve this migration problem.
Maybe I'm missing something, but this seems to work:
CREATE TABLE #webhist(
Unit int,
Value int,
HistoryOn datetime
)
INSERT INTO #webhist VALUES
(1, 123, '2013-01-05 14:16:00'),
(1, 234, '2013-01-07 12:12:00'),
(2, 325, '2013-01-04 14:12:00'),
(1, 657, '2013-02-04 17:11:00'),
(3, 132, '2013-04-02 13:00:00')
SELECT
u1.Unit
,u1.Value
,u1.HistoryOn AS HistoryStart
,u2.HistoryOn AS HistoryEnd
FROM #webhist u1
OUTER APPLY (
SELECT TOP 1 *
FROM #webhist u2
WHERE u1.Unit = u2.Unit AND u1.HistoryOn < u2.HistoryOn
ORDER BY HistoryOn
) u2
DROP TABLE #webhist
First data sample
create table Data(
Unit int,
Value int,
HistoryOn datetime)
insert into Data
select 1,123,'2013-01-05 14:16:00'
union select 1 , 234 , '2013-01-07 12:12:00'
union select 2 , 325 , '2013-01-04 14:12:00'
union select 1 , 657 , '2013-02-04 17:11:00'
union select 3 , 132 , '2013-04-02 13:00:00'
I created a function to calculate HistoryEnd
Noticed I named Data to table
CREATE FUNCTION dbo.fnHistoryEnd
(
#Unit as int,
#HistoryOn as datetime
)
RETURNS datetime
AS
BEGIN
-- Declare the return variable here
DECLARE #HistoryEnd as datetime
select top 1 #HistoryEnd=dateadd(s,-1,d.HistoryOn )
from Data d
where d.HistoryOn>#HistoryOn and d.Unit=#Unit
order by d.HistoryOn asc
RETURN #HistoryEnd
END
GO
Then, the query is trivial
select *,dbo.fnHistoryEnd(a.Unit,a.HistoryOn) from Data a
order by Unit, HistoryOn
EDIT
Don't forget order by clause in sub query. Look what could happen if not
CREATE TABLE #webhist(
Unit int,
Value int,
HistoryOn datetime
)
INSERT INTO #webhist VALUES
(1, 234, '2013-01-07 12:12:00'),
(2, 325, '2013-01-04 14:12:00'),
(1, 657, '2013-02-04 17:11:00'),
(3, 132, '2013-04-02 13:00:00'),
(1, 123, '2013-01-05 14:16:00')
select *, (select top 1 historyon from #webhist u2 where u2.historyon > u1.historyon and u1.unit = u2.unit) from #webhist u1;
select *, (select top 1 historyon from #webhist u2 where u2.historyon > u1.historyon and u1.unit = u2.unit order by u2.HistoryOn) from #webhist u1;
drop table #webhist
Related
I have the following table:
EventValue | Person1 | Person2 | Person3 | Person4 | Meta1 | Meta2
-------------------------------------------------------------------------------------------
123 | joePerson01 | samRock01 | nancyDrew01 | steveRogers01 | 505 | 606
321 | steveRogers02 | yoMama01 | ruMo01 | lukeJedi01 | 707 | 808
I want to transform the Person columns into IDs for my destination table, so all of the ID's would be coming from the same Person table in my Destination DB:
ID | FirstName | LastName | DatabaseOneID | DatabaseTwoID
----------------------------------------------------------
1 | Joe | Person | joePerson01 | personJoe01
2 | Sam | Rockwell | samRock01 | rockSam01
3 | Nancy | Drew | nancyDrew01 | drewNancy01
4 | Steve | Rogers | steveRogers01 | rogersSteve01
5 | Steve R | Rogers | steveRogers02 | rogersSteve02
6 | Yo | Mama | yoMama01 | mamaYo01
7 | Rufus | Murdock | ruMo01 | moRu01
8 | Luke | Skywalker | lukeJedi01 | jediLuke01
With results like so:
MetaID | EventValue | Person1ID | Person2ID | Person3ID | Person4ID
------------------------------------------------------------------------
1 | 123 | 1 | 2 | 3 | 4
2 | 321 | 5 | 6 | 7 | 8
I currently have a Lookup Transform looking up the first Person column, but couldn't figure out how to convert all 4 Person columns into IDs within the same lookup.
You could do it in one query, or use UNPIVOT, or use a scalar function if you think it'll be more fixable for your implementation. Then, you just create a view of it, in which it'll be an easy access for you.
here is a quick example :
DECLARE
#tb1 TABLE
(
EventValue INT
, Person1 VARCHAR(250)
, Person2 VARCHAR(250)
, Person3 VARCHAR(250)
, Person4 VARCHAR(250)
, Meta1 INT
, Meta2 INT
)
DECLARE
#Person TABLE
(
ID INT
, FirstName VARCHAR(250)
, LastName VARCHAR(250)
, DatabaseOneID VARCHAR(250)
, DatabaseTwoID VARCHAR(250)
)
INSERT INTO #tb1
VALUES
(123,'joePerson01','samRock01','nancyDrew01','steveRogers01',505,606),
(321,'steveRogers02','yoMama01','ruMo01','lukeJedi01',707,808)
INSERT INTO #Person
VALUES
(1,'Joe','Person','joePerson01','personJoe01'),
(2,'Sam','Rockwell','samRock01','rockSam01'),
(3,'Nancy','Drew','nancyDrew01','drewNancy01'),
(4,'Steve','Rogers','steveRogers01','rogersSteve01'),
(5,'SteveR','Rogers','steveRogers02','rogersSteve02'),
(6,'Yo','Mama','yoMama01','mamaYo01'),
(7,'Rufus','Murdock','ruMo01','moRu01'),
(8,'Luke','Skywalker','lukeJedi01','jediLuke01')
SELECT ROW_NUMBER() OVER(ORDER BY EventValue) AS MetaID, *
FROM (
SELECT
t.EventValue
, MAX(CASE WHEN t.Person1 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person1ID
, MAX(CASE WHEN t.Person2 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person2ID
, MAX(CASE WHEN t.Person3 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person3ID
, MAX(CASE WHEN t.Person4 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person4ID
FROM #tb1 t
LEFT JOIN #Person p
ON p.DatabaseOneID IN(t.Person1, t.Person2, t.Person3, t.Person4)
OR p.DatabaseTwoID IN(t.Person1, t.Person2, t.Person3, t.Person4)
GROUP BY t.EventValue
) D
I currently have a Lookup Transform looking up the first Person column, but couldn't figure out how to convert all 4 Person columns into IDs within the same lookup.
You cannot do this within the same lookup, you have to add a Lookup Transformation for each Column. In your case you should add 4 Lookup Transformation.
If source database and destination database are on the same server, then you can use a SQL query to achieve that as mentioned in the other answer, but in case that each database is on a separate server you have to go with Lookup transformation or you have to import data into a staging table and perform Join operations using SQL.
I am creating a code to join two different tables under a certain condition. The tables look like this
(TABLE 2)
date | deal_code | originator | servicer | random |
-----------------------------------------------------
2011 | 001 | commerzbank | SPV1 | 1 |
2012 | 001 | commerzbank | SPV1 | 12 |
2013 | 001 | commerzbank | SPV1 | 7 |
2013 | 005 | unicredit | SPV2 | 7 |
and another table
(TABLE 1)
date | deal_code | amount |
---------------------------
2011 | 001 | 100 |
2012 | 001 | 100 |
2013 | 001 | 100 |
2013 | 005 | 200 |
I would like to have this as the final result
date | deal_code | amount | originator | servicer | random |
--------------------------------------------------------------
2013 | 001 | 100 | commerzbank | SPV1 | 7 |
2013 | 005 | 200 | unicredit | SPV2 | 7 |
I created the following code
select q1.deal_code, q1.date
from table1 q1
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)
that gives me:
(TABLE 3)
date | deal_code | amount |
---------------------------
2013 | 001 | 100 |
2013 | 005 | 200 |
That is the latest observation for table 1, now I would like to have the originator and servicer information given the deal_code and date. Any suggestion? I hope to have been clear enough. Thanks.
This should do what you are looking for. Please be careful when naming columns. Date is a reserved word and is too ambiguous to be a good name for a column.
declare #Something table
(
SomeDate int
, deal_code char(3)
, originator varchar(20)
, servicer char(4)
, random int
)
insert #Something values
(2011, '001', 'commerzbank', 'SPV1', 1)
, (2012, '001', 'commerzbank', 'SPV1', 12)
, (2013, '001', 'commerzbank', 'SPV1', 7)
, (2013, '005', 'unicredit ', 'SPV2', 7)
declare #SomethingElse table
(
SomeDate int
, deal_code char(3)
, amount int
)
insert #SomethingElse values
(2011, '001', '100')
, (2012, '001', '100')
, (2013, '001', '100')
, (2013, '005', '200')
select x.SomeDate
, x.deal_code
, x.originator
, x.servicer
, x.random
, x.amount
from
(
select s.SomeDate
, s.deal_code
, s.originator
, s.servicer
, s.random
, se.amount
, RowNum = ROW_NUMBER()over(partition by s.deal_code order by s.SomeDate desc)
from #Something s
join #SomethingElse se on se.SomeDate = s.SomeDate and se.deal_code = s.deal_code
) x
where x.RowNum = 1
Looks like this would work:
DECLARE #MaxYear INT;
SELECT #MaxYear = MAX(date)
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.deal_code = t2.deal_code;
SELECT t1.date,
t1.deal_code,
t1.amount,
t2.originator,
t2.servicer,
t2.random
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.date = #MaxYear
AND t1.deal_code = t2.deal_code;
I agree with Sean Lange about the date column name. His method gets around the dependency on the correlated sub-query, but at the heart of things, you really just need to add an INNER JOIN to your existing query in order to get the amount column into your result set.
select
q2.date,
q2.deal_code,
q1.amount,
q2.originator,
q2.servicer,
q2.random
from
table1 q1
join
table2 q2
on q1.date = q2.date
and q1.deal_code = q2.deal_code
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)
I'm using SQL Server 2012 version 11.0.6020.0. Apologies in advance, I'm new to SQL.
I have one table for a person ID. Due to duplication, a person can have multiple ID's. In an attempt to clean this up, a master ID is created. However, there still exists duplications. Currently, it looks like this ...
IF OBJECT_ID('tempdb..#table1') IS NOT NULL
BEGIN
DROP TABLE #table1
END
CREATE TABLE #table1 (MasterID varchar(1), PersonID1 varchar(3), PersonID2 varchar(3), PersonID3 varchar(3), PersonID4 varchar(3), PersonID5 varchar(3))
INSERT INTO #table1 VALUES ('A', '12', '34', '56', '78', null);
INSERT INTO #table1 VALUES ('B', '34', '12', '90', null, null);
INSERT INTO #table1 VALUES ('C', '777', '888', null, null, null);
The table looks like this when the code above is executed.
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | |
| B | 34 | 12 | 90 | | |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
MasterID A and MasterID B is the same person because some of the PersonID overlap. MasterID C is a different person because it shares none of the ID's. If one ID is shared, then it's safe for me to assume that it is the same patient. So the output I want is ...
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | 90 |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
I thought about unpivoting the data and grouping it.
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
BEGIN
DROP TABLE #t1
END
SELECT MasterID, PersonID
INTO #t1
FROM
(
SELECT MasterID, PersonID1, PersonID2, PersonID3, PersonID4, PersonID5
FROM #table1
) t1
UNPIVOT
(
PersonID FOR PersonIDs IN (PersonID1, PersonID2, PersonID3, PersonID4, PersonID5)
) AS up
GO
---------------------------------------------------
SELECT min(MasterID) as MasterID, PersonID
FROM #t1
GROUP BY PersonID
ORDER BY 1, 2
However, this solution will leave me with this below where it looks like 90 is its own person.
+----------+-----------+
| MasterID | PersonID |
+----------+-----------+
| A | 12 |
| A | 34 |
| A | 56 |
| A | 78 |
| B | 90 |
| C | 777 |
| C | 888 |
+----------+-----------+
I looked through stack overflow and the closest solution I found is this but it involves two tables whereas mine is within the same table
SQL UPDATE SET one column to be equal to a value in a related table referenced by a different column?
I also found this but the max aggregate function probably won't work for my case. Merge two rows in SQL
This solution looks like it'll work but it'll require me to manually check each field for duplicate PersonID first before updating my MasterID. set a row equal to another row in the same table apart the primary key column
My goal is to have SQL check for duplicates and if found, remove the duplicates and update add the new PersonID. And as for which masterID to use, it doesn't matter whether I keep A or B.
Let me know if you know of any solutions or can direct me to one. I'm new to SQL so I may be searching the wrong keywords and vocabularies. Thanks, I really appreciate it!
Please try the following query. It adds a MainMasterID column to identify the main MasterID for each record.
select *,
(select min(MasterID)
from #table1 t2
where t1.PersonID1 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID2 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID3 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID4 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID5 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
) AS MainMasterID
from #table1 t1
/* Sample data output
MasterID PersonID1 PersonID2 PersonID3 PersonID4 PersonID5 MainMasterID
-------- --------- --------- --------- --------- --------- ------------
A 12 34 56 78 NULL A
B 34 12 90 NULL NULL A
C 777 888 NULL NULL NULL C
*/
I am looking at a SQL Server 2008 Database with two Tables, each with a PK (INT) column and a DateTime column.
There is no explicit relationship between the Tables, except I know the application has a heuristic tendency to insert to the database in pairs, one row into each Table, with DateTimes that seem to never match exactly but are usually pretty close.
I am trying to match back up the PKs in each table by finding the closest matching DateTime in the other table. Each PK can only be used once for this matching.
What is the best way to do this?
EDIT: Sorry, please find at bottom some example input and desired output.
+-------+-------------------------+
| t1.PK | t1.DateTime |
+-------+-------------------------+
| 1 | 2016-08-11 00:11:03.000 |
| 2 | 2016-08-11 00:11:08.000 |
| 3 | 2016-08-11 11:03:00.000 |
| 4 | 2016-08-11 11:08:00.000 |
+-------+-------------------------+
+-------+-------------------------+
| t2.PK | t2.DateTime |
+-------+-------------------------+
| 1 | 2016-08-11 11:02:00.000 |
| 2 | 2016-08-11 00:11:02.000 |
| 3 | 2016-08-11 22:00:00.000 |
| 4 | 2016-08-11 11:07:00.000 |
| 5 | 2016-08-11 00:11:07.000 |
+-------+-------------------------+
+-------+-------+-------------------------+-------------------------+
| t1.PK | t2.PK | t1.DateTime | t2.DateTime |
+-------+-------+-------------------------+-------------------------+
| 1 | 2 | 2016-08-11 00:11:03.000 | 2016-08-11 00:11:02.000 |
| 2 | 5 | 2016-08-11 00:11:08.000 | 2016-08-11 00:11:07.000 |
| 3 | 1 | 2016-08-11 11:03:00.000 | 2016-08-11 11:02:00.000 |
| 4 | 4 | 2016-08-11 11:08:00.000 | 2016-08-11 11:07:00.000 |
+-------+-------+-------------------------+-------------------------+
JOIN to the row with lowest DATEDIFF (in seconds) between t1.DateTime and t2.DateTime.
You can achieve the result you are looking for by cross joining table 1 with table 2 and then getting the difference of the dates in seconds as per Tab Alleman’s suggestion. The next step would then be to rank each match using the ROW_NUMBER() function. Final step is to select out only rows which Rank = 1.
The following example demonstrates using your example data:
DECLARE #t1 TABLE
(
ID INT PRIMARY KEY
,[DateTime] DATETIME
);
DECLARE #t2 TABLE
(
ID INT PRIMARY KEY
,[DateTime] DATETIME
)
INSERT INTO #t1
(
ID
,[DateTime]
)
VALUES
(1 ,'2016-08-11 00:11:03.000'),
(2 ,'2016-08-11 00:11:08.000'),
(3 ,'2016-08-11 11:03:00.000'),
(4 ,'2016-08-11 11:08:00.000');
INSERT INTO #t2
(
ID
,[DateTime]
)
VALUES
(1, '2016-08-11 11:02:00.000'),
(2, '2016-08-11 00:11:02.000'),
(3, '2016-08-11 22:00:00.000'),
(4, '2016-08-11 11:07:00.000'),
(5, '2016-08-11 00:11:07.000');
WITH CTE_DateDifference
AS
(
SELECT t1.ID AS T1_ID
,t2.ID AS T2_ID
,t1.[DateTime] AS T1_DateTime
,t2.[DateTime] AS T2_DateTime
,ABS(DATEDIFF(SECOND, t1.[DateTime], t2.[DateTime])) AS Duration -- Determine the difference between the dates in seconds.
FROM #t1 t1
CROSS JOIN #t2 t2
),CTE_RankDateMatch
AS
(
SELECT T1_ID
,T2_ID
,T1_DateTime
,T2_DateTime
,ROW_NUMBER() OVER (PARTITION BY T1_ID ORDER BY Duration) AS [Rank] -- Rank each match, the row numbers generated will be order based on the duration between the dates. Thus rows with a number of 1will be the closest match between the two tables.
FROM CTE_DateDifference
)
-- Finally select out the rows with a Rank equal to 1.
SELECT *
FROM CTE_RankDateMatch
WHERE [Rank] = 1
I have a question in sql server
table name : Emp
Id |Pid |Firstname| LastName | Level
1 |101 | Ram |Kumar | 3
1 |100 | Ravi |Kumar | 2
2 |101 | Jaid |Balu | 10
1 |100 | Hari | Babu | 5
1 |103 | nani | Jai |44
1 |103 | Nani | Balu |10
3 |103 |bani |lalu |20
Here need to retrieve unique records based on id and Pid columns and records which have duplicate records need to skip.
Finally I want output like below
Id |Pid |Firstname| LastName | Level
1 |101 | Ram |Kumar | 3
2 |101 | Jaid |Balu | 10
3 |103 |bani |lalu |20
I found duplicate records based on below query
select id,pid,count(*) from emp group by id,pid having count(*) >=2
this query get duplicated records 2 that records need to skip to retrieve output
please tell me how to write query to achieve this task in sql server.
Since your output is based on unique ID and PID which do not have any duplicate value, You can use COUNT with partition to achieve your desired result.
SQL Fiddle
Sample Data
CREATE TABLE Emp
([Id] int, [Pid] int, [Firstname] varchar(4), [LastName] varchar(5), [Level] int);
INSERT INTO Emp
([Id], [Pid], [Firstname], [LastName], [Level])
VALUES
(1, 101, 'Ram', 'Kumar', 3),
(1, 100, 'Ravi', 'Kumar', 2),
(2, 101, 'Jaid', 'Balu', 10),
(1, 100, 'Hari', 'Babu', 5),
(1, 103, 'nani', 'Jai', 44),
(1, 103, 'Nani', 'Balu', 10),
(3, 103, 'bani', 'lalu', 20);
Query
SELECT *
FROM
(
SELECT *,rn = COUNT(*) OVER(PARTITION BY ID,PID)
FROM Emp
) Emp
WHERE rn = 1
Output
| Id | Pid | Firstname | LastName | Level |
|----|-----|-----------|----------|-------|
| 1 | 101 | Ram | Kumar | 3 |
| 2 | 101 | Jaid | Balu | 10 |
| 3 | 103 | bani | lalu | 20 |