Use of FireDAC's macrofunction LIMIT on subqueries - sql-server

I want to know if there are any restrictions on using FireDAC's "LIMIT" macrofunction on subqueries like this:
SELECT TOP 10 * FROM TABLE1 WHERE NOT EXISTS( SELECT TOP 1 FIELD1 FROM TABLE2 )
With LIMIT applied it would be as follows:
SELECT {LIMIT(0,10)} * FROM TABLE WHERE NOT EXISTS( SELECT {LIMIT(0,10)} FIELD1 FROM TABLE2 )
If so, I would like to know what alternatives exist to limit the number of rows returned in a subquery considering that it is necessary to be compatible with more than one database manager (Oracle and SQL Server).
This is a simplified use case, in a real scenario it is expected to use this macrofunction in much more complex queries.
Thanks in advance.

Related

Select where id not in (...) has any limit? - NOT IN VS NOT EXISTS

I want to select some rows from a catalog which are not related to two table (Table1 and Table2).
The rows in the catalog are related to the tables using the field idA.
I am using the following query:
;with CTE(id) AS(
select distinct idA from Table1
union
select distinct idA from Table2
)
select * from Catalog where IdA NOT IN (select id from CTE)
The very strange part is that this query does not return any result.
If I use the following query after de CTE it will return some rows
select Id from Catalog where not exists (select 1 from CTE where Catalog.IdA = CTE.Id);
Does anyone know what is the reason of this? AFAK both queries are equivalents.
Of course the first query is slower than the second one but that is not very important. The important part is, why those queries do not return the same results?
Maybe it's important to mention that Table1 has more than 3.9 million rows
but just 139,283 different values for idA.
Table2 has only some thousands rows
Any help or comment will be appreciated
If one or more NULL values are returned by a NOT IN subquery, the result of the predicate is unknown rather than true or false and no rows returned due to the unsatisfied condition. It is best to avoid NOT IN with nullable expressions to avoid this non-intuitive behavior.
Although slighly more verbose, a correlated NOT EXISTS subquery will always return true or false and avoid the NULL gotcha.

How does t-sql update work without a join

I think my head is muddy or something. I'm trying to figure out how a t-sql update works without a join when updating one table from another. I've always used joins in the past but came across a stored proc where someone else created one without a join. This update is being used in SQL 2008R2 and it works.
Update table1
SET col1 = (SELECT TOP 1 colX FROM table2 WHERE colZ = colY),
col2 = (SELECT TOP 1 colE FROM table2 WHERE colZ = colY)
Obviously, colY is a field in table1. To get the same results in a select statement (not update), a join is required. I guess I don't understand how an update works behind the scenes but it must be doing some kind of join?
SQL Server translates those subqueries into joins. You can look at this by getting the query plan. You can write an equivalent query with UPDATE ... FROM ... JOIN syntax and observe the query plan to be essentially the same.
The sample code shown is unusual, hard to understand, redundant and inflexible. I recommend against using this style.
No it's doing a sub query, well two in this case. Be damn painful if you have another 98 col fields.
You can do something similar for select
select *,
(SELECT TOP 1 colX FROM table2 WHERE colZ = colY) as col1
From table1
A left join would simply be more efficient
Your example unless the dbms optimises it it running the subquery(ies) for each row in table.
Got to say whoever wrote it is less than competent.
These subqueries are what is called correlated subqueries. If you were to write the same query as a SELECT rather than an UPDATE it would look like this.
SELECT col1 = (SELECT TOP 1 table2.colX FROM table2 WHERE table2.colZ = table1.colY),
col2 = (SELECT TOP 1 table2.colE FROM table2 WHERE table2.colZ = table1.colY)
FROM table1
The JOIN is in the fact that you are referencing a column from an outside table on the inside of the subquery. Table1 is referenced in the UPDATE command. You can include a FROM clause but it isn't required for a setup like this.
You can use the same syntax in a SELECT with no join, but you need to alias the table if colY also exists in table2
SELECT (SELECT TOP 1 colX FROM table2 WHERE colZ = T.colY)
, (SELECT TOP 1 colE FROM table2 WHERE colZ = T.colY)
FROM table1 AS T
I only ever use this sort of thing when building up an ad hoc query just for my own infomation. If it's going to be put into any sort of permanent code I'll convert it to a join as it's easier to read and more maintainable.

Transact SQL parallel query execution

Suppose I have
INSERT INTO #tmp1 (ID) SELECT ID FROM Table1 WHERE Name = 'A'
INSERT INTO #tmp2 (ID) SELECT ID FROM Table2 WHERE Name = 'B'
SELECT ID FROM #tmp1 UNION ALL SELECT ID FROM #tmp3
I would like to run queries 1 & 2 in parallel, and then combine results after they are finished.
Is there a way to do this in pure T-SQL, or a way to check if it will do this automatically?
A background for those who wants it: I investigate a complex search where there're multiple conditions which are later combined (term OR (term2 AND term3) OR term4 AND item5=term5) and thus I investigate if it would be useful to execute those - largely unrelated - conditions in parallel, later combining resulting tables (and calculating ranks, weights, and so on).
E.g. should be several resultsets:
SELECT COUNT(*) #tmp1 union #tmp3
SELECT ID from (#tmp1 union #tmp2) WHERE ...
SELECT * from TABLE3 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
SELECT * from TABLE4 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
You don't. SQL doesn't work like that: it isn't procedural. It leads to race conditions and data issues because of other connections
Table variables are also scoped to the batch and connection so you can't share results over 2 connections in case you're wondering.
In any case, all you need is this, unless you gave us an bad example:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION
SELECT ID FROM Table2 WHERE Name = 'B'
I suspect you're thinking of "run in parallel" because of this procedural thinking. What is your actual desired problem and goal?
Note: table variables do not allow parallel operations: Can queries that read table variables generate parallel exection plans in SQL Server 2008?
You don't decide what to parallelise - SQL Server's optimizer does. And the largest unit of work that the optimizer will work with is a single statement - so, you find a way to express your query as a single statement, and then rely on SQL Server to do its job, which it will usually do quite well.
If, having constructed your query, the performance isn't acceptable, then you can look at applying hints or forcing certain plans to be used. A lot of people break their queries into multiple statements, either believing that they can do a better job than SQL Server, or because it's how they "naturally" think of the task at hand. Both are "wrong" (for certain values of wrong), but if there's a natural breakdown, you may be able to replicate it using Common Table Expressions - these would allow you to name each sub-part of the problem, and then combine them together, all as part of a single statement.
E.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabB AS (
SELECT ID FROM Table2 WHERE Name = 'B'
)
SELECT ID FROM TabA UNION ALL SELECT ID FROM TabB
And this will allow the server to decide how best to resolve this query (e.g. deciding whether to store intermediate results in "temp" tables)
Seeing in one of your other comments you discussing about having to "work with" the intermediate results - this can still be done with CTEs (if it's not just a case of you failing to be able to express the "final" result as a single query), e.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabAWithCalcs AS (
SELECT ID,(ID*5+6) as ModID from TabA
)
SELECT * FROM TabAWithCalcs
Why not just:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION ALL
SELECT ID FROM Table2 WHERE Name = 'B'
then if SQL Server wants to run the two selects in parallel, it will do at its own violition.
Otherwise we need more context for what you're trying to achieve if this isn't practical.

Usage of IN Clause in a Database query

I have written query, I wanted to know the effect of usage of IN clause in Query.
Query I
Select * from khatapayment_track
Where Property_ID IN (Select Property_id from khata_header where DIV_ID=2)
Query II
Select * from khatapayment_track KPT, khata_header KH
Where kh.Property_id=KPT.Property_id and KH.DIV_Id=2
I would like to know
1) which one is faster
2) Any effects of using IN clause, is it advisable to use if a query has a 3 IN clause.
Can you please help me with examples
Your second query is faster, but it is better to use joins (it looks nicer and it have the same execution plan):
select
*
from
khatapayment_track t
inner join khata_header h on (h.property_id = t.property_id)
where
h.div_id = 2
Also you can use mysql profiler to compare your queries.
Query II is faster in your example as you are using subquery in Query I.
but consider follwing example which returns similar o/p but query i will returns faster
select * from tableName where id in (1,2,3,4)
is similar to
select * from tableName where id =1 OR id =2 OR id =3 Or id =4

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1
You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.
No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.
if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)
I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)

Resources