SQL Server : Insert statement duplicating non duplicate values - sql-server

I have run into this scenario a couple of times, but it does not occur all the time on the same databases while testing. I have two separate databases I am merging into a single db both structured exactly the same. When inserting records from one database to the other, I am seeing distinct values duplicate on my target database however exist only once in one source and not in the target.
Example:
DB1..Customer
Cust_ID | Last_Name | First_Name | Phone | Email | Field1
1 | Smith | John | 111-1111 | m#M.com |
DB2..Customer
Cust_ID | Last_Name | First_Name | Phone | Email | Field1
1 | Jones | Steve | 222-2222 | S#S.com |
2 | Smith | Tom | 333-3333 | S#m.com |
When I run my query:
INSERT INTO DB1..Customer (Last_Name, First_Name, Phone, Email, Field1)
SELECT
Last_name, First_Name, Phone, Email, Cust_ID
FROM
DB2..Customer DB2
WHERE
DB2.Cust_ID NOT IN (SELECT DB2.Cust_ID
FROM DB2..Customer DB2
INNER JOIN DB1..Customer DB1 ON DB1.Last_Name = DB2.Last_Name
AND DB1.First_Name = DB2.First_Name
AND DB1.Email = DB2.Email)
Results:
DB1..Customer
Cust_ID | Last_Name | First_Name | Phone | Email | Field1
1 | Smith | John | 111-1111 | m#M.com |
2 | Jones | Steve | 222-2222 | S#S.com | 1
3 | Jones | Steve | 222-2222 | S#S.com | 1
4 | Jones | Steve | 222-2222 | S#S.com | 1
5 | Jones | Steve | 222-2222 | S#S.com | 1
6 | Smith | Tom | 333-3333 | S#m.com | 2
7 | Smith | Tom | 333-3333 | S#m.com | 2
8 | Smith | Tom | 333-3333 | S#m.com | 2
I notice duplicate values entered when I run a count on the field1 column having more than one count of db2..customer.cust_id. Since Cust_ID is the PK value I should only have one value flow into the field1 column per my query.
Any ideas or suggestions on why this may be occurring? My last run of my query duplicated some items up to 4 times. It seems to me SQL is caught in a bit of a loop searching for the patient while also writing them to the target db at the same time.

Left joining is a little slower, but easier to read and does what you want.
INSERT INTO DB1..Customer(
Last_Name
, First_Name
, Phone
, Email
, Field1)
SELECT
B.Last_name
, B.First_Name
, B.Phone
, B.Email
, B.Cust_ID
FROM
DB2..Customer B
LEFT JOIN
DB1..Customer A ON
A.Last_Name = B.Last_Name
AND
A.First_Name = B.First_Name
AND
A.Email = B.Email
AND
A.Phone = B.Phone
WHERE A.Cust_ID IS NULL;

Could you try changing the aliases used in the outer query and sub-query to be different? I don't have multiple instances at hand to test, but I wonder if it is being interpreted as a correlated subquery.
Try the following query, which uses DB1_Inner/DB2_Inner/DB2_Outer to differentiate the aliases:
Insert into DB1..Customer (Last_Name, First_Name, Phone, Email, Field1)
SELECT Last_name, First_Name, Phone, Email, Cust_ID
from DB2..Customer DB2_Outer
Where DB2_Outer.Cust_ID not in
(Select DB2_Inner.Cust_ID
from DB2..Customer DB2_Inner
Inner Join DB1..Customer DB1_Inner
on DB1_Inner.Last_Name=DB2_Inner.Last_Name
and DB1_Inner.First_Name=DB2_Inner.First_Name
and DB1_Inner.Email=DB2_Inner.Email)

Related

T-SQL Query comparing Member counts between 2 tables

TABLE 1: Data sent to vendor
| MemberID | FirstName | LastName | Etc |
| :------: | :-------: | :------: | :-: |
| 1 | John | Smith | Etc |
| 2 | Jane | Doe | Etc |
| 3 | Dan | Laren | Etc |
TABLE 2: Data returned from vendor
| MemberID | FirstName | LastName | Etc |
| :------: | :-------: | :------: | :-: |
| 1 | John | Smith | Etc |
| 2 | Jane | Doe | Etc |
| 3 | Dan | Laren | Etc |
We send data to a vendor which is used for their matching algorithm and they return the data with new information. The members are matched with a MemberID data element. How would I write a query which shows me which MemberIDs we sent to the vendor but the vendor didn't return?
NOT EXITS would be my first choice here.
Example
SELECT *
FROM Table1 A
WHERE NOT EXISTS (SELECT 1
FROM Table2 B
WHERE A.MemberID = B.MemberID )
SELECT MemberID
FROM Table1
WHERE MemberID NOT IN (SELECT MemberID FROM Table2)
Using EXCEPT is one option.
SELECT sent.[MemberID] FROM Tbl1_SentToVendor sent
EXCEPT
SELECT recv.[MemberID] FROM Tbl2_ReturnedFromVendor recv
This is just on MemberID, but the "EXCEPT" syntax can also support additional columns (e.g., in case you want to filter out data that may be the same as what you already have.)

SQL Server: how count from value from dynamic columns?

SQL Server: how count from value from dynamic columns?
I have data:
+ Subject
___________________
| SubID | SubName |
|-------|---------|
| 1 | English |
|-------|---------|
| 2 | Spanish |
|-------|---------|
| 3 | Korean |
|_______|_________|
+ Student
______________________________________
| StuID | StuName | Gender | SubID |
|---------|---------|--------|--------|
| 1 | David | M | 1,2 |
|---------|---------|--------|--------|
| 2 | Lucy | F | 2,3 |
|_________|_________|________|________|
I want to query result as:
____________________________________
| SubID | SubName | Female | Male |
|--------|---------|--------|------|
| 1 | English | 0 | 1 |
|--------|---------|--------|------|
| 2 | Spanish | 1 | 1 |
|--------|---------|--------|------|
| 3 | Koean | 1 | 0 |
|________|_________|________|______|
This is my query:
SELECT
SubID, SubName, 0 AS Female, 0 AS Male
FROM Subject
I don't know to replace 0 with real count.
Because you made the mistake of storing CSV data in your tables, we will have to do some SQL Olympics to get your result set. We can try joining the two tables on the condition that the SubID from the subject table appears somewhere in the CSV list of IDs in the student table. Then, aggregated by subject and count the number of males and females.
SELECT
s.SubID,
s.SubName,
COUNT(CASE WHEN st.Gender = 'F' THEN 1 END) Female,
COUNT(CASE WHEN st.Gender = 'M' THEN 1 END) Male
FROM Subject s
LEFT JOIN Student st
ON ',' + CONVERT(varchar(10), st.SubID) + ',' LIKE
'%,' + CONVERT(varchar(10), s.SubID) + ',%'
GROUP BY
s.SubID,
s.SubName;
Demo
But, you would be best off refactoring your table design to normalize the data better. Here is an example of a student table which looks a bit better:
+---------+---------+--------+--------+
| StuID | StuName | Gender | SubID |
+---------+---------+--------+--------+
| 1 | David | M | 1 |
+---------+---------+--------+--------+
| 1 | David | M | 2 |
+---------+---------+--------+--------+
| 2 | Lucy | F | 2 |
+---------+---------+--------+--------+
| 2 | Lucy | F | 3 |
+---------+---------+--------+--------+
We can go a bit further, and even store the metadata separately from the StuID and SubID relationship. But even using just the above would have avoided the ugly join condition.
If the version of your SQL Server is SQL Server or above, you could use STRING_split function to get expected results.
create table Subjects
(
SubID int,
SubName varchar(30)
)
insert into Subjects values
(1,'English'),
(2,'Spanish'),
(3,'Korean')
create table student
(
StuID int,
StuName varchar(30),
Gender varchar(10),
SubID varchar(10)
)
insert into student values
(1,'David','M','1,2'),
(2,'Lucy','F','2,3')
--Query
;WITH CTE AS
(
SELECT
S.Gender,
S1.value AS SubID
FROM student S
CROSS APPLY STRING_SPLIT(S.SubID,',') S1
)
select
T.SubID,
T.SubName,
COUNT(CASE T1.Gender WHEN 'F' THEN 1 END) AS Female,
COUNT(CASE T1.Gender WHEN 'M' THEN 1 END) AS Male
from Subjects T
LEFT JOIN CTE T1 ON T.SubID=T1.SubID
GROUP BY T.SubID,T.SubName
ORDER BY T.SubID
--Output
/*
SubID SubName Female Male
----------- ------------------------------ ----------- -----------
1 English 0 1
2 Spanish 1 1
3 Korean 1 0
*/

Insert two column values into single SQL Server

The following two tables Table 1 and Table 2 are given-
Table 1
+-----+------+---------+
| ID | Name | Earning |
+-----+------+---------+
| 101 | John | HRA |
| 101 | John | Travel |
| 102 | Andy | Travel |
+-----+------+---------+
Table 2
+-----+------+---------+
| ID | Name |Deduction|
+-----+------+---------+
| 101 | John | ENP |
| 102 | Andy | ENP |
| 102 | Andy | RA |
+-----+------+---------+
and I need to create a third table Table 3 with following columns
I have already created two columns ID and Name .I only need EarningOrDeduction column.
With
INSERT INTO Table3 (ID, Name, EarningOrDeduction)
SELECT ID, Name, Earning FROM Table1
UNION ALL
SELECT ID, Name, Deduction FROM Table2;
I'm getting
Table 3
+-----+------+------------------+
| ID | Name |EarningOrDeduction|
+-----+------+------------------+
| 101 | John | HRA |
| 101 | John | Travel |
| 102 | Andy | Travel |
| 101 | John | ENP |
| 102 | Andy | ENP |
| 102 | Andy | RA |
+-----+------+------------------+
But I want output as
Table 3
+-----+------+------------------+
| ID | Name |EarningOrDeduction|
+-----+------+------------------+
| 101 | John | HRA |
| 101 | John | Travel |
| 101 | John | ENP |
| 102 | Andy | Travel |
| 102 | Andy | ENP |
| 102 | Andy | RA |
+-----+------+------------------+
You can select both table data with Union clause.
And if you don't want to insert already entered values use following query.
INSERT INTO Table3 (EarningOrDeduction)
SELECT X FROM(
SELECT Earning X FROM Table1
UNION
SELECT Deduction X FROM Table2
) T
LEFT JOIN Table3 T3 ON T.X=T3.EarningOrDeduction
WHERE T3.EarningOrDeduction IS NULL
You could try inserting a union of values from the two tables:
INSERT INTO Table3 (ID, Name, EarningOrDeduction)
SELECT ID, Name, Earning FROM Table1
UNION ALL
SELECT ID, Name, Deduction FROM Table2;
Or, if you don't really want to populate Table3 with these values, you could just run the above select without the first insert line.
UNION ALL with Order by should work.
INSERT INTO Table3 (ID, Name, EarningOrDeduction)
SELECT ID, Name, EarningOrDeduction from
(SELECT ID, Name, Earning as [EarningOrDeduction] FROM Table1
UNION ALL
SELECT ID, Name, Deduction FROM Table2) ORDER BY ID, Name;
I assume Earnings and Deduction will not produce duplicate value for particular Name.

Selecting from two tables with 1st row from the second table

Table ACCOUNT(Name, Debit, Credit)
Name | Debit | Credit
=========================
Ram | 2,000 | 2,000
Bheem | 3,000 | 3,000
Soorya | 2,500 | 1,750
John | 3,500 | 2,500
Abdul | 1,600 | 00000
Soorya | 1,500 | 00000
Table CLIENTS(Name, ContactNumber)
Name | ContactNumber
======================
Ram | 900800
Bheem | 900700
Soorya | 900600
John | 900400
Abdul | 900100
John | No Value
SQL
SELECT Name, SUM(Debit), SUM(Credit)
FROM ACCOUNT
WHERE SUM(Credit)<>SUM(Debit)
GROUP BY Name & ContactNumber
FROM CLIENTS WHERE ACCOUNT.Name=CLIENTS.Name
If the Name of client exists twice, Only the 1st ContactNumber should be selected.
Expected result:
Name | SUM(Debit) | SUM(Credit) | ContactNumber
==================================================
Soorya | 4,000 | 1,750 | 900600
John | 3,500 | 2,500 | 900400
Abdul | 1,600 | 0000 | 900100
How do I to sort this problem?
Not sure if this is the most elegant solution, but it gave the correct answer on the test data provided
WITH tmp
AS (SELECT Name,
Sum(Debit) AS SumDebit,
Sum(Credit) AS SumCredit
FROM accounts
GROUP BY Name)
SELECT a.Name,
a.SumDebit,
a.SumCredit,
c.ContactNumber
FROM tmp a,
(SELECT Name,
Max(ContactNumber) AS ContactNumber
FROM clients
GROUP BY Name) c
WHERE a.Name = c.Name
AND a.SumDebit <> a.SumCredit
Try using a JOIN statement, linked by the name fields.
SELECT a.Name, SUM(a.Debit), SUM(a.Credit), DISTINCT(c.ContactNumber)
FROM ACCOUNT a
WHERE SUM(a.Credit) != SUM(a.Debit)
INNER JOIN CLIENTS c
ON a.Name = c.Name
GROUP BY a.Name
Hope it helps.

select unique rows based on single distinct column [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 7 months ago.
I want to select rows that have a distinct email, see the example table below:
+----+---------+-------------------+-------------+
| id | title | email | commentname |
+----+---------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 4 | i agree | rob#hotmail.com | rob |
| 5 | its ok | rob#hotmail.com | rob |
| 6 | hey | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+---------+-------------------+-------------+
The desired result would be:
+----+-------+-------------------+-------------+
| id | title | email | commentname |
+----+-------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+-------+-------------------+-------------+
Where I don't care which id column value is returned.
What would be the required SQL?
Quick one in TSQL
SELECT a.*
FROM emails a
INNER JOIN
(SELECT email,
MIN(id) as id
FROM emails
GROUP BY email
) AS b
ON a.email = b.email
AND a.id = b.id;
I'm assuming you mean that you don't care which row is used to obtain the title, id, and commentname values (you have "rob" for all of the rows, but I don't know if that is actually something that would be enforced or not in your data model). If so, then you can use windowing functions to return the first row for a given email address:
select
id,
title,
email,
commentname
from
(
select
*,
row_number() over (partition by email order by id) as RowNbr
from YourTable
) source
where RowNbr = 1
If you are using MySql 5.7 or later, according to these links (MySql Official, SO QA), we can select one record per group by with out the need of any aggregate functions.
So the query can be simplified to this.
select * from comments_table group by commentname;
Try out the query in action here
Since you don't care which id to return I stick with MAX id for each email to simplify SQL query, give it a try
;WITH ue(id)
AS
(
SELECT MAX(id)
FROM table
GROUP BY email
)
SELECT * FROM table t
INNER JOIN ue ON ue.id = t.id
SELECT * FROM emails GROUP BY email;

Resources