select top 1 * vs select top 1 1 - sql-server

I know there's a lot of these questions, but I can't find one that relates to my question.
Looking at this question, Is Changing IF EXIST(SELECT 1 FROM ) to IF EXIST(SELECT TOP 1 FROM ) has any side effects?
Specifically referring to this section in the answer:
select * from sys.objects
select top 1 * from sys.objects
select 1 where exists(select * from sys.objects)
select 1 where exists(select top 1 * from sys.objects)
I'm running some of my own tests to properly understand it. As indicated in the answer:
select 1 where exists(select top 1 * from sys.objects)
select 1 where exists(select top 1 1 from sys.objects)
both cause the same execution plan and also causes the same plan as
select 1 where exists(select * from sys.objects)
select 1 where exists(select 1 from sys.objects)
From my research into questions like this one, “SELECT TOP 1 1” VS “IF EXISTS(SELECT 1”. I'm deducing that this is the agreed best practice:
select 1 where exists(select * from sys.objects)
My first question is why is this preferred over this:
select 1 where exists(select 1 from sys.objects)
In trying to understand it, I broke them down to their more basic expressions (I'm using 'top 1' to mimic an execution plan resembling exists):
select top 1 * from sys.objects
select top 1 1 from sys.objects
I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%. Would it then not be better practice to use
select 1 where exists(select 1 from sys.objects)
as it can be applied to both scenarios and thereby reduce possible human error?

SQL Server detects EXISTS predicate relatively early in the query compilation / optimisation process, and eliminates actual data retrieval for such clauses, replacing them with existence checks. So your assumption:
I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%.
is wrong, because in the preceding comparison you have actually retrieved some data, which doesn't happen if the query is put into the (not) exists predicate.
Most of the time, there is no difference how to test for the existence of rows, except for a single yet important catch. Suppose you say:
if exists (select * from dbo.SomeTable)
...
somewhere in the code module (view, stored procedure, function etc.). Then, later, when someone else will decide to put WITH SCHEMABINDING clause into this code module, SQL Server will not allow it and instead of possibly binding to the current list of columns it will throw an error:
Msg 1054, Level 15, State 7, Procedure BoundView, Line 6
Syntax '*' is not allowed in schema-bound objects.
So, in short:
if exists (select 0 from ...)
is a safest, fastest and one-size-fits-all way for existence checks.

The difference between these two:
select top 1 * from sys.objects
select top 1 1 from sys.objects
Is that in the first clause SQL server must fetch all the columns from the table (from any random row), but in the second it's just ok to fetch "1" from any index.
Things change when these clauses are inside exists clause, because in that case SQL Server knows that it doesn't actually have to fetch the data because it will not be assigned to anything, so it can handle select * the same way it would handle select 1.
Since exists checks just one row, it has internal top 1 built into it, so adding it manually doesn't change anything.
Weather to have select * or select 1 in exists clause is just based on opinion, and instead of 1 you could of course have 2 or 'X' or whatever else you like. Personally I always use ... and exists (select 1 ...

EXISTS is a type of subquery which can only return a boolean value based upon whether any rows are returned by the subquery. Selecting 1, or * or, whatever doesn't matter within this context because the result is always just true or false.
You can verify this by testing that these two statements produce the exact same plan.
select 1 where exists(select * from sys.objects)
select 1 where exists(select 1 from sys.objects)
What you select in your outer query DOES matter. As you found, these two statements produce very different execution plans:
select top 1 * from sys.objects
select top 1 1 from sys.objects
The first one will be slower because it has to actually return real data. In this case, joining to the three underlying tables: syspalnames, syssingleobjrefs, and sysschobjs.
As to the preference of what you put inside your EXISTS subqueries - SELECT 1 or SELECT * - it doesn't matter. I usually say SELECT 1, but SELECT * is just as good and you'll see it in a lot of Microsoft documentation.

I was looking for an answer to just the actual question contained in the title. I found it at this link:
Select Top 1 or Top n basically returns the first n rows of data based
on the sql query. Select Top 1 1 or Top n s will return the first n
rows with data s depending on the sql query.
For example, the query below produces the first name and last name of
the first 10 matches. This query will return first name and last name
only.
SELECT TOP 10 FirstName, LastName
FROM [tblUser]
where EmailAddress like 'john%'
Now, look at this query with select top 10 'test' - this will produce
the same number of rows as in the previous query (same database, same
condition) but the values will be 'test'.
SELECT TOP 10 'test'
FROM [tblUser]
where EmailAddress like 'john%'
So, select TOP 1 * returns the first row, while select TOP 1 1 returns one row containing just "1". This if the query returns at least one row, otherwise Null will be returned in both cases.
As additional example, this:
SELECT TOP 10 'test', FirstName
FROM [tblUser]
where EmailAddress like 'john%'
will return a table containing a column filled with "test" and another column filled with the first name of the first 10 matches of the query.

Related

SQL Server : concatenate results of 2 queries as a string

I am new to SQL, and I need to make a string out of the UNION result. I have seen many similar questions but they were either related to concatenating results of a single SQL query, or they were using JOIN on some kind of row id, while I do not need this and do not have any column on which I can use JOIN.
I have the following UNION:
(
SELECT COUNT(*)
FROM [db].[table1]
WHERE [ItemType] = 2
)
UNION
(
SELECT TOP(1) Items
FROM [db].[table2]
WHERE [ItemType] = 2
)
It returns a simple result with two rows:
15
10
15 is the total number of items, and 10 is the number of items left available.
I want to return a table with only one entry 10/15. What is the simplest way to achieve it? Thanks.
A bit of a guess, but perhaps:
SELECT CONCAT((SELECT COUNT(*) FROM [db].[table1] WHERE [ItemType] = 2)),'/',
(SELECT TOP(1) Items FROM [db].[table2] WHERE [ItemType] = 2 ORDER BY {Your Column}));
Note the {Your Column} which you need to replace with an appropriate column to give the correct consistent result. Having a TOP without an ORDER BY can (and will) produce inconsistent results, as tables in SQL Server are stored in unordered heaps; therefore the "TOP 1" will be whatever row SQL Server "finds" (retrieves) first from your table and is effectively random.
You have lurking problems in what you are trying to do.
First, UNION is a distincting operation. if you happen to have 15 total items AND 15 available, you'll only get one row back. That's not what you want. (UNION ALL would fix this, but you don't need UNION stuff at all).
Your next issue(?) may be your data model choice. The second table (Items) has values in it - you are pulling one row out of a particular type, but you don't have any control over which one gets pulled out. It could be any value in the set. If you want the count of items available, as opposed to the first item randomly that SQL picks for you, then you may want to adjust your query. (For "give me the first item and it actually is the count", you would add an ORDER BY into that subselect to help SQL pick the proper "first" order in a given sort. For "I want the count of distinct items in this table, you might want count(*) or count(distinct item) depending on your semantic).
Once you sort these two things out, you can then use two subselects to get each scalar value and then convert them to strings as you are attempting to do in your example. Here's an example on how the pattern to do this should look once you clarify your data model issue
select convert(nvarchar(100),a) + '/' + convert(nvarchar(100),b) FROM
(
select (select count(*) from sys.objects) as a, (select count(*) from sys.objects) as b
) C
Result:
101/101
It's a fairly quick operation so for readability I'd try something like this. You can format your results easily at the bottom too, should you need to add dashes or extra spaces too.
DECLARE #A int,
#B int
SET #A = (
SELECT COUNT(*)
FROM [db].[table1]
WHERE [ItemType] = 2
)
SET #B = (
SELECT TOP(1) Items
FROM [db].[table2]
WHERE [ItemType] = 2
)
SELECT (#A + '/' + #B) as result

WITH is not working as I expected sqlServer 2012

I am getting diferents results into a WITH statement. here is my first query:
with q as (select top (100000) * from table1) select * from q
Let's say that table1 has an ID field, everything seems to be normal if I execute that query, it works as I expected. But if I change the statement like this:
with q as (select top (100000) * from table1) select [ID] from q
or
with q as (select top (100000) * from table1) select q.[ID] from q
it brings me results that does not exists into the first query (note that I only bring ID). I understand that WITH statement is a temporal result set an I expect that both queries brings the same result no matter how many fields I select, so why is this happening?, this could be a problem if i want to perform an update or even worst if I do a delete I will not be completely sure if I have affected the rows that I wanted
If you select top x without an order by, the result set is arbitrarily returned. Meaning you can get a different result set if you execute it twice. Since you are changing the query slightly, I'm not surprised the result set is different. Add an ORDER BY if you SELECT TOP x

SQL Server user defined function returns table -- cannot call it from select query

I cannot get this type of select query (pseudo-code) to work. The UDF returns a table with 8 columns in a single row for a given 'UID_VEHICLE'. It works perfectly when the 'UID_VEHICLE' is provided as a constant like 3308. But I need one row of these function-results for each vehicle for a given customer -- up to 100 rows to be returned.
SELECT
*
FROM
[dbo].[fnGetNextDOT_InspectionData](UID_VEHICLE)
WHERE
UID_VEHICLE IN (SELECT UID_VEHICLE
FROM tVEHICLES
WHERE UID_CUSTOMER = 88);
Your comments and solutions are welcome...thanks...John
When passing row values from a query into a TVF, you need to use CROSS APPLY or OUTER APPLY (starting with SQL Server 2005):
SELECT * -- or dot.*, or whatever is desired
FROM tVEHICLES veh
CROSS APPLY [dbo].[fnGetNextDOT_InspectionData](veh.UID_VEHICLE) dot
WHERE veh.UID_CUSTOMER = 88;

Cannot update Column Windowed function error

Assuming that I had previously declared a table called #temp which has count as NULL values, and later on I wanted to update that column in my Script how would I do that?
count --- CAM
1 201
1 2
1 2012
2 20
I have the update statement which would be:
Update #temp set [count]= ((ROW_NUMBER() over(order by CAM desc)-1/3)+1
However, it gives me the following error:
Windowed functions can only appear in the SELECT or ORDER BY clauses.
I have tried many different ways using a select statement, but no luck!. Any help on this?
If I'm understanding what you want to do, although count is a bit of an odd column name here given the data it seems to hold:
WITH cte AS
(
SELECT (row_number() OVER(ORDER BY CAM DESC) - 1)/3 + 1 AS [count],
CAM
FROM #temp
)
UPDATE #temp
SET #temp.[count] = cte.[count]
FROM #temp
INNER JOIN cte ON #temp.CAM = cte.CAM
Note I've also pulled the /3 outside of the parentheses - I believe this is what you've intended.
This will work as long as CAM is unique.

SQL Server and intermediate materialization?

After reading this interesting article about intermediate materialization - I still have some questions.
I have this query :
SELECT *
FROM ...
WHERE isnumeric(MyCol)=1 and ( CAST( MyCol AS int)>1)
However, the where clause order is not deterministic.
So I might get exception here.( if he first tries to cast "k1k1" )
I assume this will solve the problem
SELECT MyCol
FROM
(SELECT TOP 100 PERCENT foo From MyTable WHERE ISNUMERIC (MyCol ) > 1 ORDER BY MyCol ) bar
WHERE
CAST(MyCol AS int) > 100
why does putting top 100 + order will change VS my regular query ?
I read in the comments :
(the "intermediate" result -- in other words, a result obtained during
the process, that will be used to calculate the final result) will be
physically stored ("materialized") in TempDB and used from there for
the remainder of the user, instead of being queried back from the base
tables.
what difference does it makes if it is stored in tempDB or queried back from the base tables? it is the same data !
The supported way to avoid errors due to the optimizer reorganizing things is to use CASE:
SELECT *
FROM YourTable
WHERE
1 <=
CASE
WHEN aa NOT LIKE '%[^0-9]%'
THEN CONVERT(int, aa)
ELSE 0
END;
Intermediate materialization is not a supported technique, so it should only be employed by very expert users in special circumstances where the risks are understood and accepted.
TOP 100 PERCENT is generally ignored by the optimizer in SQL Server 2005 onward.
By adding the TOP clause into the inner query, you're forcing SQL Server to run that query first before it runs the outer query - thereby discarding all rows for which ISNUMERIC returns false.
Without the TOP clause, the optimiser can rewrite the query to be the same as your first query.

Resources