Is this a SQL Parser bug? Or my misunderstanding? - sql-server

Using SQL Server 2008 R2 (Version 10.50.4000.0):
I have two tables. I want to delete data in one table where an ID exists in another table. Simple enough. However, due to a mistake in typing, I found what appears to be a parser bug.
In the script below, [SomeID] is a column in Table1 but does not actually exist in Table2
Delete from Table1 where [SomeID] in (Select [SomeID] from Table2)
If you run the subquery "Select [SomeID] from Table2, you get an appropriate error message stating that the column does not exist.
If you run the whole delete query though, it runs without error, and deletes everything in table1
It seems that the parser should have caught that the column did not exist in table 2. I know you can use columns from outside the sub-query, and I realize that the parser was assuming that I meant to use a column from table1, but since I had not specified any columns from table2, the parser should have, in my opinion, been smart enough to know there was something wrong. Fortunately, we were in a test environment when this happened. :)
Thanks,
Tony

It's not an error
The engine will do what you put in the query.
when you use In clause, it will compare for each row, the field SomeId, with the result in the In
For example, you could do this
Delete from Table1 where [SomeID] in (Select [SomeID] from Table2 where IdTable2 = SomeId)
So, you could return anything in the select, as a constant, as an Id in the table2, a sum of both, or, like this case, the ID in the first Table.
In this case, when execute the subquery, as doesn't have a where clause, the subquery will always return a result (unless got no rows), because it doesn't have a filter, but what will show ? You're specifying a field in the parent-query, and you're able to show it, so it will return it.
Of corse, as you're comparing with In, so is deleting all in table1, right ?

Related

How do I Select an aggregate function from a temp table without getting the invalid column error from not including the column in the GROUP BY clause?

I performed aggregate functions in a temp table but I'm getting an error because the field I performed the aggregate function on is not included in a GROUP BY in the table I am selecting from. To clarify, this is just a snippet so these tables are temp tables in the larger query. They are also named in the actual code.
WITH #t1 AS
(SELECT
Name,
Date,
COUNT(Email),
COUNT(DISTINCT Email)
FROM SentEmails)
SELECT
#t1.*,
#t2.GrossSents
FROM #t1
--***JOINS***
GROUP BY
#t1.Name,
#t1.Date
I expect a table with Name, Date, Count of Emails, Unique Emails, and Gross Sends fields but I get
Column '#t1.COUNT(Email)' is invalid in the select list` because it is not contained in either an aggregate function or the GROUP BY clause.
Break your issue into steps.
Start by getting the query inside your CTE to return the data you expect from it. The query as written here won't run because you're doing aggregation without a GROUP BY clause.
Once that query is giving you the results you want, wrap it in the CTE syntax and try a SELECT * FROM cteName to see if that works. You'll get an error here because each column in a CTE has to have a name and your last two columns don't have names. Also, as noted in the comments, it's a poor practice to name your CTE with a #. It makes the subsequent code more confusing, since it appears as though there's a temp table someplace, and there isn't.
After you have the CTE returning what you need, start joining other tables, one at a time. Monitor those results as you add tables so you're sure that your JOINs are working as you expect.
If you're doing further aggregation on the outer query, specifying SELECT * is just asking for trouble because you're going to need to specify every non-aggregated column in your GROUP BY anyway. As a general rule, you should enumerate your columns in your SELECT, and in this case that will allow you to copy & paste them to your eventual GROUP BY.

sql query to find all the updated column of a table

I need a dynamic sql query that can get a column values of a table on the basis of any case/condition. I want to do that while update any record. And I need updated value of column and for that I am using Inserted and Deleted tables of SQL Server.
I made one query which is working fine with one column but I need a generic query that should work for all of the columns.
SELECT i.name
FROM inserted i
INNER JOIN deleted d ON i.id = d.id
WHERE d.name <> i.name
With the help of above query we can get "Name" column value if it's updated. But as it's specific to one column same thing I want for all the columns of a table in which there should be no need to define any column name it should be generic/dynamic query.
I am trying to achieve that by adding one more inner join with PIVOT of "INFORMATION_SCHEMA.COLUMNS" for columns but I am not sure about it whether we can do that or not by this.
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME LIKE '%table1%'
You can't get that kind of information using just a query. It sounds like you need to be running an update trigger. In there you can code the criteria get your columns, etc. From your question, it sounds like only one column can be updated, you are just not sure which one it will be. If multiple columns are being updated, that complicates things, but not that much
There are several ways that you can get the data you need. Off the top, I'd suggest some simple looping once you get the column names.
What has worked for me in similar scenarios is to do a simple query to INFORMAATION_SCHEMA.Columns and put the results in a table variable. From there, iterate through the table variable to get one column name at a time and run a simple dynamic query to see if the value has changed.
Hope this helps.
:{)

Single condition slows down SQL query drastically

I have a SQL query looking something like this:
WITH RES_CTE AS
(SELECT
COLUMN1,
COLUMN2,
[MORE COLUMNS...]
ROW_NUMBER() OVER (ORDER BY R.RANKING DESC) AS RowNum
FROM TABLE1 As R, TABLE2 As A, TABLE3 As U, TABLE4 As S, TABLE5 As T
WHERE R.RID = A.LID
AND S.QRYID = R.QRYID
AND A.AID = U.AID
AND CONDITION1 = 'VALUE'
AND CONDITION2 = 'VALUE'
AND [MORE CONDITIONS...]
),
Results_Cnt AS
(SELECT COUNT(*) CNT FROM Results_CTE)
SELECT * FROM Results_CTE, Results_Cnt WHERE RowNum >= 1 AND RowNum <= 25
Now, this query typically runs under 1 sec and returns the 25 records out of 5000 based on CONDITION1.
Recently though, I added a new column to a TABLE1 and then use its values as a CONDITION2 in the query above. The column is populated going forward but all the values in the past are NULL.
I read something above joining table that have NULL being a reason for slow execution. The table has about 1,300,000 records. 90% of them are NULL in the problematic column. But that column is not being joined on. (The one that is being joined on has an INDEX)
However, I wanted to try that anyway by creating a new column and simply copying the data like so:
ALTER TABLE TABLE1 ADD COL_NEW
UPDATE TABLE1 SET COL_NEW = COL_OLD
My next step was to replace the NULLs with an actual value but first, just for kicks, I changed the query to use as a condition the new field COL_NEW, and the problem went away.
Although I'm happy the problem is gone, I can't explain it to myself. Why was the execution slow in the first place if it had nothing to do with the NULLs?
UPDATE: It appears the problem may have resulted from a cached query plan. So the question essentially becomes, how to force a query plan refresh?
UPDATE: Although doing ALTER TABLE may have refreshed the execution plan, the problem returned. How can I find out what is happening?
It sounds like your query plan got cached while the stats for the new column showed it completely full of nulls, forcing a table scan. Following the ALTER TABLE the query plan was refreshed, replcing the table scan with an index lookujp again, and performance returned to normal.
The only way to know for sure if that is what happened would be to examine the query plans for both queries, but those are long gone now.

Wrong case in subquery column name causes incorrect results, but no error

Using SQL Server Management Studio, I am getting some undesired results (looks like a bug to me..?)
If I use (FIELD rather than field for the other_table):
SELECT * FROM main_table WHERE field IN (SELECT FIELD FROM other_table)
I get all results from main_table.
Using the correct case:
SELECT * FROM main_table WHERE field IN (SELECT field FROM other_table)
I get the expected results where field appears in other.
Running the subquery on it's own:
SELECT FIELD FROM other_table
I get an invalid column name error.
Surely I should get this error in the first case?
Is this related to collation?
The DB is binary collation.
The server is case insensitive however.
It seems to me like the server component is saying "this code is OK" and not allowing the DB to say the field is the wrong name..?
What are my options for a solution?
Let's illustrate what is happening using something that doesn't depend on case sensitivity:
USE tempdb;
GO
CREATE TABLE dbo.main_table(column1 INT);
CREATE TABLE dbo.other_table(column2 INT);
INSERT dbo.main_table SELECT 1 UNION ALL SELECT 2;
INSERT dbo.other_table SELECT 1 UNION ALL SELECT 3;
SELECT column1 FROM dbo.main_table
WHERE column1 IN (SELECT column1 FROM dbo.other_table);
Results:
column1
-------
1
2
Why doesn't that raise an error? SQL Server is looking at your queries and seeing that the column1 inside can't possibly be in other_table, so it is extrapolating and "using" the column1 that exists in the outer referenced table (just like you could reference a column that only exists in the outer table without a table reference). Think about this variation:
SELECT [column1] FROM dbo.main_table
WHERE EXISTS (SELECT [column1] FROM dbo.other_table WHERE [column2] = [column1]);
Results:
column1
-------
1
Again SQL Server knows that column1 in the where clause also doesn't exist in the locally referenced table, but it tries to find it in the outer scope. So in an imaginary world you might consider the query to actually be saying:
SELECT m.[column1] FROM dbo.main_table AS m
WHERE EXISTS (SELECT m.[column1] FROM dbo.other_table AS o WHERE o.[column2] = m.[column1]);
(Which is not how I typed it, but if I do type it that way, it still works.)
It doesn't make logical sense in some of the cases but this is the way the query engine does it and the rule has to be applied consistently. In your case (no pun intended), you have an extra complication: case sensitivity. SQL Server didn't find FIELD in your subquery, but it did find it in the outer query. So a couple of lessons:
Always prefix your column references with the table name or alias (and always prefix your table references with the schema).
Always create and reference your tables, columns and other entities using consistent case. Especially when using a binary or case-sensitive collation.
Very interesting find. The unspoken mandate is that you always should alias tables in your subqueries and use those aliases to be explicit about which table your column comes from. Subqueries allow you to make reference to a field from your outer query which is the cause of your issue, but in your scenario I would agree that either the default should be the internal query's field list, or to give you a column ambiguity error. Regardless, this method below is always preferable:
select * from main_table a where a.field in
(select x.field from other_table x)

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1
You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.
No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.
if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)
I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)

Resources