Why limit 2 result is not a superset of limit 1 result? - snowflake-cloud-data-platform

I created a temp table in snowflake and query it with the following command
select mykey
from myDB
limit 1;
The query returns result 'X'
Then I use the following command
select mykey
from myDB
limit 2;
The query returns result 'Y' and 'Z'. Question:
Why the result returned from the second query is not a superset of the result from first query?

You have no ORDER BY clause, thus your data is not forced into a stable sorted order, thus the limit may not be looking at the same rows.

Related

SQL Server - best way to estimate size of result set of query in MB before returning the data?

Given a query that queries multiple tables, is there a way to calculate the size of the result set before returning the full result set?
My best guess so far is to write a query that sums the size of all columns multiplied by row count to get an estimation. Maybe have a metatable that stores the average size of each column for each table, updated by a stored procedure each morning to get the average size (since some columns can be NVARCHAR(MAX))
You create a table with query results and run sp_spaceused. Here is my solution. You can refer.
SELECT *
INTO tablename
FROM ...
Exec sp_spaceused 'tablename'
Drop tablename

Short Circuit EXISTS statment in SQL Server query for Table Valued Parameters

I use stored procedures:
In my WHERE clause, I use short circuits (OR's) to speed up execution as the Query Optimiser knows that most of my inputs are defaulted to Null. This allows my query to be flexible and fast.
I have added a Table Valued Parameter to the WHERE clause. The execution time for a report has risen from 150ms to 450ms, reads from 70,000 to 200,000.
...
WHERE
--Integer value parameters
AND ((#hID is Null) OR (h.ID = #hID))
AND ((#dID is Null) OR (d.ID = #dID))
AND ((#mID is NULL) OR (m.ID = #mID))
--New table value parameter
--Execute, Processing time and read's increased.
--No additional JOIN added.
AND (NOT EXISTS (SELECT Null FROM #rIDs) OR r.ID IN (SELECT r FROM #rIDs))
How can I short circuit the NOT EXISTS or speed up this query please? I have tried adding a BIT value and checking if rows are in the Table Valued Parameter before executing the query. The only way I have found is having two queries and executing one over the other. Not great if I have to modify a whole bunch of queries or add multiple Table Valued Parameters to the mix.
Thanks in advance.
EDIT:
A comparison of table value parameter:
AND (NOT EXISTS (SELECT Null FROM #rIDs) OR r.ID IN (SELECT r FROM #rIDs))
and integer parameter:
AND ((#rID) OR (r.ID = #rID))
showed similar execution speed after compilation with TVP at 0 rows and Integer parameter null. I assume the Query Optimiser is short circuiting in the correct manor and my previous comparison was incorrect. Execution plan splits the above cost at 55% vs 45%, which is acceptable. Although the split doesn't change when there are more rows in the TVP, the time to generate the report increases because more pages have to be read from disk. Interesting.
if exists (select * from #rIDs)
begin
.... -- query with TVP
end
else
begin
.... -- query without TVP
end
This allows a separate execution plan for each query.
It looks like you are using a table variable. If you use a temporary table and index the column you are using for your criteria (r in your example), you will avoid a table scan. This however makes it a multiple step process, but they payoff can be huge.
To be more specific to your question, you can change the last line of your example to be
AND EXISTS (SELECT r FROM #rIDs WHERE r = r.ID AND NOT r IS NULL)
If you could post the execution plan, I could give you a much better answer. Click the Display Estimated Execution Plan, right click the execution plan and select Save Execution Plan As...
You could try a LEFT JOIN between your table to be queried (on the left) and your TVP.

T-SQL not equal operator vs Case statement

Assume I have a T-SQL statement:
select * from MyTable
where Code != 'RandomCode'
I've been tasked with making this kind of where statement perform more quickly. Books Online says that positive queries (=) are faster than negative (!= , <>).
So, one option is make this into a CASE statement e.g.
select * from MyTable
where
case when Code = 'RandomCode' then 0
else 1 end = 1
Does anyone know if this can be expected to be faster or slower than the original T-SQL ?
Thanks in advance.
You have to be more specific at what information you are interested in the table and what are the possible values of the Code column. Then you can create appropriate indexes to speed up the query.
For example, if values in the Code column could only be one of 'RandomCode', 'OtherCode', 'YetAnotherCode', you can re-write the query as:
SELECT * FROM MyTable WHERE Code = 'OtherCode' OR Code = 'YetAnotherCode'
And of course you need an index on the Code column.
If you have to do an inequality query, you can change SELECT * to a more narrow query like:
SELECT Id, Name, Whatever FROM MyTable WHERE Code != 'RandomCode'
Then create an index like:
CREATE INDEX idx_Code ON MyTable(Code) INCLUDE (Id,Name,Whatever)
This can reduce I/O by replacing a table scan with an index scan.

SQL Server where clause using In() vs Wildcard

Is there any performance difference between query A and query B?
Query A
SELECT * FROM SomeTable
WHERE 1 = 1 AND (SomeField LIKE '[1,m][6,e][n]%')
Query B
SELECT * FROM SomeTable
WHERE 1 = 1 AND (SomeField IN ('16', 'Mens'))
The first could be much slower. An index can't be used with LIKE unless there is a constant prefix, for example LIKE 'foo%'. The first query will therefore require a table scan. The second query however could use an index on SomeField if one is available.
The first query will also give the wrong results as it matches '1en'.

partition function in SQL Server 2005

In MSDN about partition function from here, $PARTITION(Transact-SQL).
I am confused about what the below sample is doing underlying. My understanding is, this SQL statement will iterate all rows in table Production.TransactionHistory, and since for all the rows which will mapping to the same partition, $PARTITION.TransactionRangePF1(TransactionDate) will return the same value, i.e. the partition number for all such rows. So, for example, all rows in partition 1 will result in one row in returning result since they all of the same value of $PARTITION.TransactionRangePF1(TransactionDate). My understanding correct?
USE AdventureWorks ;
GO
SELECT $PARTITION.TransactionRangePF1(TransactionDate) AS Partition,
COUNT(*) AS [COUNT] FROM Production.TransactionHistory
GROUP BY $PARTITION.TransactionRangePF1(TransactionDate)
ORDER BY Partition ;
GO
If your parition function is defined like
CREATE PARTITION FUNCTION TransactionRangePF1(DATETIME)
AS RANGE RIGHT FOR VALUES ('2007-01-01', '2008-01-01', '2009-01-01')
, then this clause:
$PARTITION.TransactionRangePF1(TransactionDate)
is equivalent to:
CASE
WHEN TransactionDate < '2007-01-01' THEN 1
WHEN TransactionDate < '2008-01-01' THEN 2
WHEN TransactionDate < '2009-01-01' THEN 3
ELSE 4
END
If all your dates fall before '2007-01-01', then the first WHEN clause will always fire and it will always return 1.
The query you posted will return at most 1 row for each partition, as it will group all the rows from the partition (if any) into one group.
If there are no rows for any partition, no rows for it will be returned in the resultset.
It returns the number of records in each of the non-empty partitions in the partitioned table Production.TransactionHistory, so yes your reasoning is correct.
Have you tried generating an execution plan for the statement? That might give you some insight into what it's actually doing underneath the cover.
Press "Control-L" to generate an execution plan and post it here if you'd like some interpretation.

Resources