T-sql, finding elements not matched from list of keys

T-sql, finding elements not matched from list of keys - sql-server

SQL Fiddle: http://sqlfiddle.com/#!6/52c67/1
CREATE TABLE MailingList (EmployeeId INT, Email VARCHAR(50))
INSERT INTO MailingList VALUES (1, 'bob#co.com')
INSERT INTO MailingList VALUES (2, 'jill#co.com')
INSERT INTO MailingList VALUES (3, 'frank#co.com')
INSERT INTO MailingList VALUES (4, 'fred#co.com')
Now I get a list of EmployeeIds from somewhere: 1,2,3,4,5
I need to check which of these employeeIds are NOT in the Mailinglist table. I expect to get the result "5" in this case, as it is NOT in the mailinglist table.
What is the easiest way to do this?
Is there an easier way than generating a temporary table, inserting the values 1,2,3,4,5 and then doing either a select ... where not in (select ...) - or getting the same with doing a join. So basically without creating a temporary table and insert the data, but just working with the list 1,2,3,4,5.

Everyone is on the right track here with the idea of an ANTI JOIN. It's worth noting however, that the answers proposed will not always produce the exact same results and each solution has different performance implications. What MatBailie is proposing is how to do an ANTI JOIN, What Alexander is proposing is how to do an ANTI SEMI JOIN.
Alexander is more on the right track IMO as what we're looking for is an ANTI SEMI JOIN; a LEFT ANTI SEMI JOIN, to be specific, with your list of employeeIds from "somewhere" as the Left table and MailingList as the Right table.
An ANTI JOIN returns records that exist in this set that don't exist in that set. By set I'm referring to a table, view, subquery, etc. By "this" set I'm referring to the LEFT table and by "that" set I'm referring to RIGHT table. A SEMI JOIN is where only one matching row from the LEFT table is returned. In other words, A SEMI join returns a distinct set.
Now I get a list of EmployeeIds from somewhere
Using the sample data provided. Let's say that, by "somewhere" you are talking about a table. (I'm including the number 5 twice to demonstrate the difference between and ANTI JOIN and ANTI SEMI JOIN)
CREATE TABLE dbo.somewhere (employeeId int);
INSERT dbo.somewhere VALUES (1),(2),(3),(4),(5),(5);
You could do a LEFT ANTI JOIN using NOT IN or NOT EXISTS
-- ANTI JOIN USING NOT IN
SELECT somewhere.EmployeeId--, <other columns>
FROM dbo.somewhere
WHERE somewhere.EmployeeId NOT IN (SELECT EmployeeId FROM dbo.MailingList); -- EXLCLUDE IDs NOT IN MailingList
-- ANTI JOIN USING NOT EXISTS
SELECT somewhere.EmployeeId--, <other columns>
FROM dbo.somewhere
WHERE NOT EXISTS
(
SELECT EmployeeId
FROM dbo.MailingList ML
WHERE ML.EmployeeId = somewhere.employeeId
);
Note that Each of these return the number 5 twice. If you only needed it once you would use EXCEPT to perform an ANTI SEMI JOIN like so:
SELECT somewhere.EmployeeId
FROM dbo.somewhere
EXCEPT -- SET OPERATOR (SET OPERATORS INCLUDE: UNION, UNION ALL, EXCEPT, INTERSECT)
SELECT EmployeeId
FROM dbo.MailingList; -- EXLCLUDE IDs NOT IN MailingList
EXCEPT is a Set Operator like UNION and INTERSECT. Set operators return a unique result set. (The one exception to this being UNION ALL). If you wanted a unique result set using NOT IN or NOT EXISTS you would also need to include DISTINCT or GROUP BY all the columns which you want to be unique.
If by "somewhere" you are talking about a comma-delimited list or XML or JSON file/fragment then you would first need to turn that list, XML, JSON or whatever into the LEFT table. Using SQL Server 2016's string_split (or another "splitter" function) you would do this:
-- "somewhere" is a csv, list or array
DECLARE #somewhere varchar(1000) = '1,2,3,4,5';
-- ANTI JOIN WITH NOT IN
SELECT EmployeeId = [value]
FROM string_split(#somewhere, ',')
WHERE [value] NOT IN (SELECT EmployeeId FROM dbo.MailingList);
-- ANTI SEMI JOIN WITH NOT IN
SELECT DISTINCT EmployeeId = [value]
FROM string_split(#somewhere, ',')
WHERE [value] NOT IN (SELECT EmployeeId FROM dbo.MailingList);
-- ANTI SEMI JOIN WITH EXCEPT
SELECT EmployeeId = [value]
FROM string_split(#somewhere, ',')
EXCEPT
SELECT EmployeeId FROM dbo.MailingList;
GO
.. or if it were XML, one option would look like this:
-- "somewhere" is XML
DECLARE #somewhere XML =
'<employees>
<employee>1</employee>
<employee>2</employee>
<employee>3</employee>
<employee>4</employee>
<employee>5</employee>
</employees>'
-- ANTI SEMI JOIN using EXCEPT
SELECT employeeId = emp.id.value('.', 'int')
FROM (VALUES (#somewhere)) s(empid)
CROSS APPLY empid.nodes('/employees/employee') emp(id)
EXCEPT
SELECT employeeId
FROM dbo.MailingList;
Lastly. You want an index on EmployeeId in your mailing list table. In my examples you would want an index on dbo.somewhere as well. If you are doing SEMI joins then you want those indexes to be unique.

You can use EXCEPT command.
Example:
SELECT *
FROM
(
SELECT 1 AS Id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) AS t
EXCEPT
SELECT Id FROM MailingList

You don't seem to be asking about the logic, just about how to "best" represent the set {1,2,3,4,5}.
One answer is a temporary table, as you mentioned.
Another is a sub-query or a CTE with a bunch of UNION ALL statements.
Another would be to use VALUES (1), (2), (3), (4), (5) in either a CTE or sub-query.
But there is a glaring point here. If you have a table with an EmployeeID field, then surely you have an Employee table? That being the case you should be able to "derive" your set of 5 employees from there?
(SELECT id FROM employee WHERE manager_id = 666)
or...
(SELECT id FROM employee WHERE staff_ref IN ('111', '222', '333', '444', '555'))
etc, etc...
EDIT:
As for the actual logic once you have your set representing your 5 employees, you can do an "anti-join" using LEFT JOIN and IS NULL...
SELECT
Employee.*
FROM
Employee
LEFT JOIN
MailingList
ON MailingList.list_id = 789
AND MailingList.employee_id = Employee.id
WHERE
Employee.manager_id = 666
AND MailingList.employee_id IS NULL
=> Employees with manager #666 but not on mailing list #789

Related

SQL Server : DELETE FROM table FROM table

I keep coming across this DELETE FROM FROM syntax in SQL Server, and having to remind myself what it does.
DELETE FROM tbl
FROM #tbl
INNER JOIN tbl ON fk = pk AND DATEDIFF(day, #tbl.date, tbl.Date) = 0
EDIT: To make most of the comments and suggested answers make sense, the original question had this query:
DELETE FROM tbl
FROM tbl2

As far as I understand, you would use a structure like this where you are restricting which rows to delete from the first table based on the results of the from query. But to do that you need to have a correlation between the two.
In your example there is no correlation, which will effectively be a type of cross join which means "for every row in tbl2, delete every row in tbl1". In other words it will delete every row in the first table.
Here is an example:
declare #t1 table(A int, B int)
insert #t1 values (15, 9)
,(30, 10)
,(60, 11)
,(70, 12)
,(80, 13)
,(90, 15)
declare #t2 table(A int, B int)
insert #t2 values (15, 9)
,(30, 10)
,(60, 11)
delete from #t1 from #t2
The result is an empty #t1.
On the other hand this would delete just the matching rows:
delete from #t1 from #t2 t2 join #t1 t1 on t1.A=t2.A

I haven't seen this anywhere before. The documentation of DELETE tells us:
FROM table_source Specifies an additional FROM clause. This
Transact-SQL extension to DELETE allows specifying data from
and deleting the corresponding rows from the table in
the first FROM clause.
This extension, specifying a join, can be used instead of a subquery
in the WHERE clause to identify rows to be removed.
Later in the same document we find
D. Using joins and subqueries to data in one table to delete rows in
another table The following examples show two ways to delete rows in
one table based on data in another table. In both examples, rows from
the SalesPersonQuotaHistory table in the AdventureWorks2012 database
are deleted based on the year-to-date sales stored in the SalesPerson
table. The first DELETE statement shows the ISO-compatible subquery
solution, and the second DELETE statement shows the Transact-SQL FROM
extension to join the two tables.
With these examples to demonstrate the difference
-- SQL-2003 Standard subquery
DELETE FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID IN
(SELECT BusinessEntityID
FROM Sales.SalesPerson
WHERE SalesYTD > 2500000.00);
-- Transact-SQL extension
DELETE FROM Sales.SalesPersonQuotaHistory
FROM Sales.SalesPersonQuotaHistory AS spqh
INNER JOIN Sales.SalesPerson AS sp
ON spqh.BusinessEntityID = sp.BusinessEntityID
WHERE sp.SalesYTD > 2500000.00;
The second FROM mentions the same table in this case. This is a weird way to get something similar to an updatable cte or a derived table
In the third sample in section D the documentation states clearly
-- No need to mention target table more than once.
DELETE spqh
FROM
Sales.SalesPersonQuotaHistory AS spqh
INNER JOIN Sales.SalesPerson AS sp
ON spqh.BusinessEntityID = sp.BusinessEntityID
WHERE sp.SalesYTD > 2500000.00;
So I get the impression, the sole reason for this was to use the real table's name as the DELETE's target instead of an alias.

2 nvarchar fields are not matching though the data is same?

I want to join 2 tables using an Inner Join on 2 columns, both are of (nvarchar, null) type. The 2 columns have the same data in them, but the join condition is failing. I think it is due to the spaces contained in the column values.
I have tried LTRIM, RTRIM also
My query:
select
T1.Name1, T2.Name2
from
Table1 T1
Inner Join
Table2 on T1.Name1 = T2.Name2
I have also tried like this:
on LTRIM(RTRIM(T1.Name1)) = LTRIM(RTRIM(T2.Name2))
My data:
Table1 Table2
------ ------
Name1(Column) Name2(Column)
----- ------
Employee Data Employee Data
Customer Data Customer Data
When I check My data in 2 tables with
select T1.Name1,len(T1.Name1)as Length1,Datalength(T1.Name1)as DataLenght1 from Table1 T1
select T2.Name2,len(T2.Name2)as Length2,Datalength(T2.Name2)as DataLenght2 from Table2 T2
The result is different Length and DataLength Values for the 2 tables,They are not same for 2 tables.
I can't change the original data in the 2 tables. How can I fix this issue.
Thank You

Joins do not have special rules for equality. The equality operator always works the same way. So if a = b then the join on a = b would work. Therefore, a <> b.
Check the contents of those fields. They will not be the same although you think they are:
select convert(varbinary(max), myCol) from T
Unicode has invisible characters (that only ever seem to cause trouble).

declare #t table (name varchar(20))
insert into #t(name)values ('Employee Data'),('Customer Data')
declare #tt table (name varchar(20))
insert into #tt(name)values ('EmployeeData'),('CustomerData')
select t.name,tt.name from #t t
INNER JOIN #tt tt
ON RTRIM(LTRIM(REPLACE(t.name,' ',''))) = RTRIM(LTRIM(REPLACE(tt.name,' ','')))

I would follow the following schema
Create a new table to store all the possible names
Add needed keys and indexes
Populate with existing names
Add columns to your existing tables to store the index of the name
Create relative foreign keys
Populate the new columns with correct indexes
Create procedure to perform an insert in new tables names only in case the value is not existing
Perform the join using the new indexes

SQL Server can you use EXCEPT or INTERSECT and ignore a column?

Here's my question,
CREATE TABLE
#table1(ID int, Fruit varchar(50), Veg varchar(50))
INSERT INTO #table1 (ID,Fruit,Veg)
VALUES (1,'Apple', 'Potato')
CREATE TABLE
#table2(ID int, Fruit varchar(50), Veg varchar(50))
INSERT INTO #table2 (ID,Fruit,Veg)
VALUES (2,'Apple', 'Potato')
SELECT * FROM #table1 INTERSECT SELECT * FROM #table2
I have two tables and I want to find rows which are the same in both, but both tables have different and unrelated ID columns. Is there any way to use INTERSECT or EXCEPT on two tables, but ignore the ID in the comparison?
I need to keep the ID's on the returned rows, so on the example above, two rows would be returned, one with ID = 1 and another with ID=2
If anything other than the ID's is different, then nothing would be returned.
Thanks!

I don't think this can be done with INTERSECT. Maybe with a join instead?
SELECT t1.id, t2.id, t1.veg, t1.fruit
FROM Table1 as t1
INNER JOIN Table2 as t2
ON t1.veg = t2.veg AND t1.fruit = t2.fruit

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)

DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo

First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;

Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"

Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!

I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.

One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?

This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight