Compare Two SQL Tables for Unique Cells and Update Master Table - sql-server

I'm using SQL Server 2017 and I've been trying to figure this out for hours. My goal is to compare 2 tables and only insert NEW rows based on UNIQUE cells. All the columns have an ID number, but I have not assigned a primary key. My goal is to ONLY add extra rows containing UNIQUE cells if none of the criteria match. This is how my tables are setup now.
Old-Data (Table name is Test1)
FName LNname Address City State Zipcode Phone Phone2 ID
Frank Smith 444 Main Y'all TX 77484 281-788-9898 NULL 1
Thomas Parker 343 Tire Y'all TX 77484 281-788-5453 NULL 2
Ben Krull 232 Wheel Y'all TX 77484 281-788-9535 NULL 3
New-Data (Table name is Test2)
FName LNname Address City State Zipcode Phone Phone2 ID
Frank Smith 444 Main Y'all TX 77484 281-788-9898 NULL 1
Thomas Parker 343 Tire Y'all TX 77484 281-788-5453 NULL 2
Ben Krull 232 Wheel Y'all TX 77484 281-788-9535 NULL 3
Juan Roberto 444 Gas Y'all TX 77484 281-788-3434 NULL 4
Ben Krull 232 Wheel Y'all TX 77484 281-788-9535 713-545-4353 5
As you can see, ID's 1,2 and 3 are identical in both tables. ID-4 is a completely unique row, as is ID-5 because of the Phone2 entry. I found some code and modified it a bit to match the headers I care about it to help me determine what entries are duplicates or not. This is the code that has been driving me crazy.
INSERT TEST1 (Name
,Last_Name
,Address
,City
,State
,Zip_Code
,Phone
,Phone2
)
SELECT Name
,Last_Name
,Address
,City
,State
,Zip_Code
,Phone
,Phone2
FROM TEST2
WHERE TEST2.NAME not in (select Name from test1)
AND TEST2.Address not in (select Address from test1)
AND TEST2.City not in (select City from test1)
AND TEST2.State not in (select State from test1)
AND TEST2.Zip_Code not in (select Zip_Code from test1)
AND TEST2.Phone not in (select Phone from test1)
AND TEST2.Phone2 not in (select phone2 from test1)
I'm trying to match all the fields and if a unique CELL is found the new row is entered into the old_data table. I see no errors after executing it, but nothing happens too. Interestingly enough, If I remove all the code below the line that says, "WHERE TEST2.NAME not in (select Name from test1)" ID-4 (Juan Roberto) is transferred over, but nothing happens with ID-5.
I'm really starting to think WHERE cannot be used to compare the duplicates and modify or add entries, but I could be wrong. A merge feature would be awesome, but I'm happy with just the former since I could always run a different script to clean up the table for dupes. I'm hoping somebody might be able to point me in the right direction since I've got millions of rows in different tables that need to be compared and trimmed down. Thanks.

Just try the following code, I am not sure about it will work for you, because I am not tested it
SELECT * INTO #TEMP FROM Test2(NOLOCK);
DELETE #TEMP
FROM #TEMP
INNER JOIN Test1
ON #TEMP.NAME = Test1.NAME
AND #TEMP.Address = Test1.Address
AND #TEMP.City = Test1.City
AND #TEMP.State = Test1.State
AND #TEMP.Zip_Code = Test1.Zip_Code
AND #TEMP.Phone = Test1.Phone
AND #TEMP.Phone2 = Test1.Phone2 ;
INSERT INTO Test1
SELECT * FROM #TEMP;

Related

How to add data to a single column

I have a question in regards to adding data to a particular column of a table, i had a post yesterday where a user guided me (thanks for that) to what i needed and said an update was the way to go for what i need, but i still can't achieve my goal.
i have two tables, the tables where the information will be added from and the table where the information will be added to, here is an example:
source_table (has only a column called "name_expedient_reviser" that is nvarchar(50))
name_expedient_reviser
kim
randy
phil
cathy
josh
etc.
on the other hand i have the destination table, this one has two columns, one with the ids and the other where the names will be inserted, this column values are null, there are some ids that are going to be used for this.
this is how the other table looks like
dbo_expedient_reviser (has 2 columns, unique_reviser_code numeric PK NOT AI, and name_expedient_reviser who are the users who check expedients this one is set as nvarchar(50)) also this is the way this table is now:
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | NULL
2 | NULL
3 | NULL
4 | NULL
5 | NULL
6 | NULL
what i need is the information of the source_table to be inserted into the row name_expedient_reviser, so the result should look like this
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | kim
2 | randy
3 | phil
4 | cathy
5 | josh
6 | etc.
how can i pass the information into this table? what do i have to do?.
EDIT
the query i saw that should have worked doesn't update which is this one:
UPDATE dbo_expedient_reviser
SET dbo_expedient_reviser.name_expedient_reviser = source_table.name_expedient_reviser
FROM source_table
JOIN dbo_expedient_reviser ON source_table.name_expedient_reviser = dbo_expedient_reviser.name_expedient_reviser
WHERE dbo_expedient_reviser.name_expedient_reviser IS NULL
the query was supposed to update the information into the table, extracting it from the source_table as long as the row name_expedient_reviser is null which it is but is doesn't work.
Since the Names do not have an Id associated with them I would just use ROW_NUMBER and join on ROW_NUMBER = unique_reviser_code. The only problem is, knowing what rows are null. From what I see, they all appear null. In your data, is this the case or are there names sporadically in the table like 5,17,29...etc? If the name_expedient_reviser is empty in dbo_expedient_reviser you could also truncate the table and insert values directly. Hopefully that unique_reviser_code isn't already linked to other things.
WITH CTE (name_expedient_reviser, unique_reviser_code)
AS
(
SELECT name_expedient_reviser
,ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
FROM source_table
)
UPDATE er
SET er.name_expedient_reviser = cte.name_expedient_reviser
FROM dbo_expedient_reviser er
JOIN CTE on cte.unique_reviser_code = er.unique_reviser_code
Or Truncate:
Truncate Table dbo_expedient_reviser
INSERT INTO dbo_expedient_reviser (name_expedient_reviser, unique_reviser_code)
SELECT DISTINCT
unique_reviser_code = ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
,name_expedient_reviser
FROM source_table
it is not posible to INSERT the data into a single column, but to UPDATE and move the data you want is the only way to go in that cases

Getting only new data from SQL Server

I've run my application which is getting data from SQL Server periodically. The problem here is that I want to get the records created since the last read.
Let me show an example.
At first, (at 12:00:00)
Table
---------
orderId orderName
1 name1
2 name2
3 name3
4 name3
I'm going to select all data here.
SELECT *
FROM TABLE
After one minute, some data was added into TABLE like below,
Table
---------
orderId orderName
1 name1
2 name2
3 name3
4 name3
5 name4
6 name5
At this point, when I select the data like below,
select *
from TABLE
what I want to get is row no. 5 and row no.6, which is added after I selected before.
My idea is I need to keep an identifier which indicates the transaction of id or something like that but I still don't catch any idea... Can you help me? Any keyword or links helping me?
It would be helpful for me to implement the refresh event to get only new data with the idea I'm expecting.
Thank you guys, here's the more specific table I have.
Table
---------
orderId orderTime
1 2017-9-4 8:00:00.000
1 2017-9-4 8:00:10.000
1 2017-9-4 8:00:20.000
1 2017-9-4 8:00:30.000
1 2017-9-4 8:00:40.000
1 2017-9-4 8:00:50.000
2 2017-9-4 8:00:11.000
2 2017-9-4 8:00:20.000
2 2017-9-4 8:00:32.000
2 2017-9-4 8:00:40.000
And the time records will be added continuously based on the orderId, respectively. Most idea that I keep the last orderId may not work for my case. How do I handle it in the specific this case?
There not any keyword to identify latest entry of last execution. Because, your query run both the time on different session. So, you can't identify it.
But Basically, we have other option to do this task.
Do it with Maintaining Max value of identifier. Maintain it into the variable, Session, Parameter, etc...
select *
from table
where orderId > yourvariableValue
You need to always save somewhere the largest orderId you've got at your last select and then, knowing that orderId, use it in your select, like this:
select ...
from ...
where orderId > 4
I see a lot of vague suggestions for saving in sessions, variables etc. I suggest you just use the table itself:
Find the last key in the local table:
SELECT ISNULL(MAX(OrderID),0) FROM TABLE
Use that to decide what to select out from your remote table
DECLARE #MaxID INT
SELECT #MaxID = ISNULL(MAX(OrderID),0) FROM TABLE
INSERT INTO TABLE (OrderID, Col1, Col2)
SELECT OrderID, Col1, Col2
FROM REMOTETABLE
WHERE OrderID > #MaxID
I see you have a recent edit saying this may not work. You need to explain why otherwise no one has any chance of helping. Is it because the OrderID might not always increment? Can it go backwards? You need to explain.
Store the last orderId and the associated last orderTime in a variable.
This should work as you want.
DECLARE #lastOrderId int = 2
DECLARE #lastOrderTime datetime = '2017-9-4 8:00:40.000'
SELECT * FROM Table
WHERE (orderId = #lastOrderId AND orderTime > #lastOrderTime) OR orderId > #lastOrderId
ORDER BY orderId, orderTime;

Delete latest entry in SQL Server without using datetime or ID

I have a basic SQL Server delete script that goes:
Delete from tableX
where colA = ? and colB = ?;
In tableX, I do not have any columns indicating sequential IDs or timestamp; just varchar. I want to delete the latest entry that was inserted, and I do not have access to the row number from the insert script. TOP is not an option because it's random. Also, this particular table does not have a primary key, and it's not a matter of poor design. Is there any way I can do this? I recall mysql being able to call something like max(row_number) and also something along the lines of limit one.
ROW_NUMBER exists in SQL Server, too, but it must be used with an OVER (order_by_clause). So... in your case it's impossible for you unless you come up with another sorting algo.
MSDN
Edit: (Examples for George from MSDN ... I'm afraid his company has a Firewall rule that blocks MSDN)
SQL-Code
USE AdventureWorks2012;
GO
SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName, LastName, ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0;
Output
Row FirstName LastName SalesYTD
--- ----------- ---------------------- -----------------
1 Linda Mitchell 4251368.54
2 Jae Pak 4116871.22
3 Michael Blythe 3763178.17
4 Jillian Carson 3189418.36
5 Ranjit Varkey Chudukatil 3121616.32
6 José Saraiva 2604540.71
7 Shu Ito 2458535.61
8 Tsvi Reiter 2315185.61
9 Rachel Valdez 1827066.71
10 Tete Mensa-Annan 1576562.19
11 David Campbell 1573012.93
12 Garrett Vargas 1453719.46
13 Lynn Tsoflias 1421810.92
14 Pamela Ansman-Wolfe 1352577.13
Returning a subset of rows
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
Using ROW_NUMBER() with PARTITION
USE AdventureWorks2012;
GO
SELECT FirstName, LastName, TerritoryName, ROUND(SalesYTD,2,1),
ROW_NUMBER() OVER(PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS Row
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0
ORDER BY TerritoryName;
Output
FirstName LastName TerritoryName SalesYTD Row
--------- -------------------- ------------------ ------------ ---
Lynn Tsoflias Australia 1421810.92 1
José Saraiva Canada 2604540.71 1
Garrett Vargas Canada 1453719.46 2
Jillian Carson Central 3189418.36 1
Ranjit Varkey Chudukatil France 3121616.32 1
Rachel Valdez Germany 1827066.71 1
Michael Blythe Northeast 3763178.17 1
Tete Mensa-Annan Northwest 1576562.19 1
David Campbell Northwest 1573012.93 2
Pamela Ansman-Wolfe Northwest 1352577.13 3
Tsvi Reiter Southeast 2315185.61 1
Linda Mitchell Southwest 4251368.54 1
Shu Ito Southwest 2458535.61 2
Jae Pak United Kingdom 4116871.22 1
Your current table design does not allow you to determine the latest entry. YOu have no field to sort on to indicate which record was added last.
You need to redesign or pull that information from the audit tables. If you have a database without audit tables, you might have to find a tool to read the transaction logs and it will be a very time-consuming and expensive process. Or if you know the date the records you want to remove were added, you could possibly use a backup from just before this happened to find the records that were added. Just be awwre that you might be looking at records changed after this date that you want to keep.
If you need to do this on a regular basis instead of one-time to fix some bad data, then you need to properly design your database to include an identity field and possibly a dateupdated field (maintained through a trigger) or audit tables. (In my opinion no database containing information your company is depending on should be without audit tables, one of the many reasons why you should never allow an ORM to desgn a database, but I digress.) If you need to know the order records were added to a table, it is your responsiblity as the developer to create that structure. Databases only store what is deisnged for tehm to store, if you didn't design it in, then it is not available easily or at all
If (colA +'_'+ colB) can not be dublicate try this.
declare #delColumn nvarchar(250)
set #delColumn = (select top 1 DeleteColumn from (
select (colA +'_'+ colB) as DeleteColumn ,
ROW_NUMBER() OVER(ORDER BY colA DESC) as Id from tableX
)b
order by Id desc
)
delete from tableX where (colA +'_'+ colB) =#delColumn

T-SQL How To: Compare and List Duplicate Entries in a Table

SQL Server 2000. Single table has a list of users that includes a unique user ID and a non-unique user name.
I want to search the table and list out any users that share the same non-unique user name. For example, my table looks like this:
ID User Name Name
== ========= ====
0 parker Peter Parker
1 parker Mary Jane Parker
2 heroman Joseph (Joey) Carter Jones
3 thehulk Bruce Banner
What I want to do is do a SELECT and have the result set be:
ID User Name Name
== ========= ====
0 parker Peter Parker
1 parker Mary Jane Parker
from my table.
I'm not a T-SQL guru. I can do the basic joins and such, but I'm thinking there must be an elegant way of doing this. Barring elegance, there must be ANY way of doing this.
I appreciate any methods that you can help me with on this topic. Thanks!
---Dan---
One way
select t1.* from Table t1
join(
select username from Table
group by username
having count(username) >1) t2 on t1.username = t2.username
The simplest way I can think of to do this uses a sub-query:
select * from username un1 where exists
(select null from username un2
where un1.user_name = un2.user_name and un1.id <> un2.id);
The sub-query selects all names that have >1 row with that name... outer query selects all the rows matching those IDs.
SELECT T.*
FROM T
, (SELECT Dupe_candidates.USERNAME
FROM T AS Dupe_candidates
GROUP BY Dupe_candidates.USERNAME
HAVING count(*)>1
) Dupes
WHERE T.USERNAME=Dupes.USERNAME
You can try the following:
SELECT *
FROM dbo.Person as p1
WHERE
(SELECT COUNT(*) FROM dbo.Person AS p2 WHERE p2.UserName = p1.UserName) > 1;

Microsoft T-SQL Counting Consecutive Records

Problem:
From the most current day per person, count the number of consecutive days that each person has received 0 points for being good.
Sample data to work from :
Date Name Points
2010-05-07 Jane 0
2010-05-06 Jane 1
2010-05-07 John 0
2010-05-06 John 0
2010-05-05 John 0
2010-05-04 John 0
2010-05-03 John 1
2010-05-02 John 1
2010-05-01 John 0
Expected answer:
Jane was bad on 5/7 but good the day before that. So Jane was only bad 1 day in a row most recently. John was bad on 5/7, again on 5/6, 5/5 and 5/4. He was good on 5/3. So John was bad the last 4 days in a row.
Code to create sample data:
IF OBJECT_ID('tempdb..#z') IS NOT NULL BEGIN DROP TABLE #z END
select getdate() as Date,'John' as Name,0 as Points into #z
insert into #z values(getdate()-1,'John',0)
insert into #z values(getdate()-2,'John',0)
insert into #z values(getdate()-3,'John',0)
insert into #z values(getdate()-4,'John',1)
insert into #z values(getdate(),'Jane',0)
insert into #z values(getdate()-1,'Jane',1)
select * from #z order by name,date desc
Firstly, I am sorry but new to this system and having trouble figuring out how to work the interface and post properly.
2010-05-13 ---------------------------------------------------------------------------
Joel, Thank you so much for your response below! I need it for a key production job that was running about 60 minutes.
Now the job runs in 2 minutes!!
Yes, there was 1 condition in my case that I needed to address. My source always had only record for those that had
been bad recently so that was not a problem for me. I did however have to handle records where they were never good,
and did that with a left join to add back in the records and gave them a date so the counting would work for all.
Thanks again for your help. It was opened my mind some more to SET based logic and how to approach it and was
a HUGE benefit to my production job.
The basic solution here is to first build a set that contains the name of each person and the value of the last day on which that person was good. Then join this set to the original table and group by name to find the count of days > the last good day for each person. You can build the set in either a CTE, a view, or an uncorrelated derived table (sub query) — any of those will work. My example below uses a CTE.
Note that while the concept is sound, this specific example might not return exactly what you want. Your actual needs here depend on what you want to happen for those who have not been good ever and for those who have not been bad recently (ie, you might need a left join to show users who were good yesterday). But this should get you started:
WITH LastGoodDays AS
(
SELECT MAX([date]) as [date], name
FROM [table]
WHERE Points > 0
GROUP BY name
)
SELECT t.name, count(*) As ConsecutiveBadDays
FROM [table] t
INNER JOIN LastGoodDays lgd ON lgd.name = t.name AND t.[date] > lgd.[date]
group by t.name

Resources