What happens if the sub-query on an IN operator fails? - sql-server

I have a T-SQL query where I am using the IN operator to find all records where the GUID is in the result of the subquery. However, I recently made changes to the schema so that Table6 does not have a GUID field and now has an AlternateID field. So the subquery for the IN operator fails if you run it. However, if I execute the query as a whole, it always returns all records in TableGUIDResolving table. It's almost as if the IN operator is returning TRUE for all records because the subquery is failing.
I have tried fixing the subquery, and it executes as expected when I do this.
Can someone help me explain this? Is this behavior intentional?
SELECT ID
FROM TableGUIDResolving
WHERE GUID IN (SELECT AlternateID AS GUID FROM Table1
UNION
SELECT GUID FROM Table2
UNION
SELECT GUID FROM Table3
UNION
SELECT GUID FROM Table4
UNION
SELECT GUID FROM Table5
UNION
SELECT GUID FROM Table6)

Yup. That is what happens when you use subqueries without qualified column names. You think you are saying:
select table6.GUID from table6
but this doesn't exist, so the scoping rules in SQL change it to:
select TableGUIDResolving.GUID from table6
I would recommend that you change your logic to a series of NOT EXISTS:
SELECT ID
FROM TableGUIDResolving tgr WHERE GUID IN (
WHERE EXISTS (SELECT 1 FROM Table1 t1 WHERE t1.AlternateID = tgr.GUID) OR
EXISTS (SELECT 1 FROM Table2 t2 WHERE t2.GUID = tgr.GUID) OR
EXISTS (SELECT 1 FROM Table3 t3 WHERE t3.GUID = tgr.GUID) OR
EXISTS (SELECT 1 FROM Table4 t4 WHERE t4.GUID = tgr.GUID) OR
EXISTS (SELECT 1 FROM Table4 t5 WHERE t5.GUID = tgr.GUID) OR
EXISTS (SELECT 1 FROM Table4 t6 WHERE t6.GUID = tgr.GUID)
If you have an index on GUID/AlternateID in each of the tables, then this should have much better performance.

Related

SQL get counts using subqueries from multiple linked tables

Suppose I have tables 1-4, all the other tables are linked to table1. For what its worth, table1, table2 and table3 are relatively small but table4 contains a lot of data.
Now I have the following query:
SELECT t1.id
, (SELECT COUNT(*) FROM table2 WHERE table1_id = t1.id) AS t2_count
, (SELECT COUNT(*) FROM table3 WHERE table1_id = t1.id) AS t3_count
, (SELECT COUNT(*) FROM table4 WHERE table1_id = t1.id) AS t4_count
FROM table1 t1
Due to the fact that the subqueries are dependent/correlated I assumed that there must be a better way (performance wise) to get the data.
I tried to do the following but it drastically increased the execution time (from about 2s to 35s). I'm guessing that the multiple left joins creates a very big data set?!
SELECT t1.id
, COUNT(t2.id) AS t2_count
, COUNT(t3.id) AS t3_count
, COUNT(t4.id) AS t4_count
FROM table1 t1
LEFT JOIN table2 t2 ON t2.table1_id = t1.id
LEFT JOIN table3 t3 ON t3.table1_id = t1.id
LEFT JOIN table4 t4 ON t4.table1_id = t1.id
GROUP BY t1.id
Is there better way to get the counts? I don't need the data from the other tables.
UPDATE:
Bart's answer got me thinking that the table1_id columns are nullable. I added a IS NOT NULL check to the WHERE clauses and this brought the time down to 1s.
SELECT t1.id
, (SELECT COUNT(*) FROM table2 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t2_count
, (SELECT COUNT(*) FROM table3 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t3_count
, (SELECT COUNT(*) FROM table4 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t4_count
FROM table1 t1
I guess not. If you execute a SELECT COUNT(*) FROM [table], it should perform a count on the table's PK. That should be pretty fast, even for very large tables.
Is your table4 a real table (and not a view, or a table-valued function, or something else that looks like a table)? And does it have a primary key? If so, I don't think that the performance of a SELECT COUNT(*) FROM [table4] query can be increased significantly.
It may also be the case, that your table4 is heavily targeted (in concurrent transactions over multiple connections), or perhaps your SQL Server is doing some heavy IO or computations. I cannot assume anything about that, however. You may try to check if your query is also slow on a restored database backup on a physically separate test server.

Implement Bitwise OR instead of multiple In clause in sql query

I have a query which uses IN clause (can use EXISTS also) for multiple columns which are filtered using OR Clause inside WHERE Clause. Is there any better approach to write this query.
SELECT columndata FROM TABLE1
WHERE column1key in (select columnkey from #temptable1)
OR column2key in (select columnkey from #temptable2)
OR column3key IN (SELECT columnkey FROM #temptable3)
You can go for 'LEFT JOIN' as shown below
SELECT columndata
FROM TABLE1 tab1
LEFT JOIN #temptable1 t1 on tab1.column1key = t1.columnkey
LEFT JOIN #temptable2 t2 on tab1.column2key = t2.columnkey
LEFT JOIN #temptable3 t3 on tab1.column3key = t3.columnkey
You may get better performance by this, which breaks down the SELECT into separate queries with a de-duplication later.
SELECT columndata FROM TABLE1
WHERE column1key in (select columnkey from #temptable1)
UNION
SELECT columndata FROM TABLE1
WHERE column2key in (select columnkey from #temptable2)
UNION
SELECT columndata FROM TABLE1
WHERE column3key IN (SELECT columnkey FROM #temptable3)
But you would really have to try it
With no or bad indexes, you still have to scan then same amount of data. With good indexes, this may work better...
As a side note, EXISTS and IN will give the same plan here

Find missing values on the same column of two tables

Suppose you have two tables in a SQL Server database with the same schema for both tables. I want to compare a single column on both tables and find the values that are missing in table1 but are in table2. I've been doing this manually in Excel with a macro after I've gotten a distinct list in each query, but it would be less work if I had a query. How can I find the missing records via T-SQL? I'd like to do this for the following data types: datetime, nvarchar & bigint.
SELECT DISTINCT [dbo].[table1].[column1]
FROM [dbo].[table1]
ORDER BY [dbo].[table1].[column1] DESC
SELECT DISTINCT [dbo].[table2].[column1]
FROM [dbo].[table2]
ORDER BY [dbo].[table2].[column1] DESC
There are several ways you can do this...
LEFT JOIN:
SELECT DISTINCT t2.column1
FROM dbo.table2 t2
LEFT JOIN dbo.table1 t1
ON t2.Column1 = t1.Column1
WHERE t1.Column1 IS NULL
NOT EXISTS:
SELECT DISTINCT t2.column1
FROM dbo.table2 t2
WHERE NOT EXISTS (
SELECT 1
FROM dbo.table1 t1
WHERE t1.column1 = t2.column1
)
NOT IN:
SELECT DISTINCT t2.column1
FROM dbo.table2 t2
WHERE t2.column1 NOT IN (
SELECT t1.column1
FROM dbo.table1 t1
)
There are some slight variations in the behavior and efficiency of these approaches... based mostly on the presence of NULL values in columns, so try each approach to find the most efficient one that gives the results you expect.
SELECT DISTINCT [dbo].[table2].[column1]
FROM [dbo].[table2]
except
SELECT DISTINCT [dbo].[table1].[column1]
FROM [dbo].[table1]
All the values of column1 in Table2 that are not present in column1 of Table1
basically, you can use LEFT JOIN.
TableB is set as the main table in this case. By joining it with TableA using LEFT JOIN, the the records that have no match on TableA a will still be in the result list but their values are NULL. So to filter out non matching records, add a filtering condition which only select records with NULL value on tableA.
SELECT b.*
FROM tableB b
LEFT JOIN tableA a
ON a.column1 = b.column1
WHERE a.column1 IS NULL
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
SQL Server 2005 onwards you could use Except
SELECT DISTINCT [dbo].[table2].[column1]
FROM [dbo].[table2]
Except
SELECT DISTINCT [dbo].[table1].[column1]
FROM [dbo].[table1]

The effect of a select with multiple tables in FROM is the same as INNER JOIN but what is the ON clause then?

I have a query like this:
SELECT
*
FROM
table1,
table2
I know this is somewhat equivalent to:
SELECT
*
FROM
table1
INNER JOIN
table2
ON ???
However, what would be the resulting ON clause for the join?
Update
After some testing in SSMS here are my findings
SELECT * FROM table1,table2
gives the same execution plan and the same records as
SELECT * FROM table1 INNER JOIN table2 ON 1=1
and the same thing for
SELECT * FROM table1 CROSS JOIN table2
the column that defines their relationship.
SELECT *
FROM table1
INNER JOIN table2
ON table1.ID = table2.ID
actually the query you have showed is not equal. The first one produces cartesian product of all the records on both table or in other words CROSS JOIN.
SELECT
*
FROM
table1,
table2
is equivalent to:
SELECT
*
FROM
table1
CROSS JOIN
table2
there is no ON statement with a CROSS JOIN. If you need to filter a CROSS JOIN, put it in the WHERE clause.
WHERE table1.DateCreated <= table2.DateModified
After some testing in SSMS here are my findings
SELECT * FROM table1,table2
gives the same execution plan and the same records as
SELECT * FROM table1 INNER JOIN table2 ON 1=1
and the same thing for
SELECT * FROM table1 CROSS JOIN table2

How to use BETWEEN in a reference table

I was just wondering on how create a query that in such a way it will check if the column is in between in a reference table.
such as
SELECT *
FROM Table1 WHERE Column1 BETWEEN ( SELECT Column1 , Column2 FROM TABLE2 )
I just don't know how to implement it in a correct way.
Thank you.
If you can have overlapping ranges in Table2, and all you want are (unique) Table1 records that are in any range in Table2, then this query will do it.
SELECT *
FROM Table1
WHERE EXISTS (
SELECT *
FROM Table2
Where Table1.Column1 BETWEEN Table2.Column1 and Table2.Column2)
You can also solve this using JOINs, if the ranges in Table2 are not overlapping, otherwise you will need to use either DISTINCT or ROW_NUMBER() to pare them down to unique Table1 records.
Try This....
SELECT *
FROM Table1 as t1
INNER JOIN Table2 t2 ON t1.Column1 BETWEEN t2.Column1 AND t2.Column2
this works
SELECT * FROM table1 as t1,table2 as t2
WHERE t1.Column1 BETWEEN t2.Column1 AND t2.Column2.

Resources