set difference in SQL query - sql-server

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?

If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.

You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)

Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?

If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

MSSQL can't understand what's happening with the action "having count(*) lesser than <some field of other table>"

I've tried to understand some part of an exercise i'm doing and just couldn't get it.
There's a part where 'T' is selected, grouped by 'a' and than it's redirected to "having count(*) < T3.a",
and I don't know how to approach it.
I've tried googling this sort of thing and see if there are similar examples but all other examples were using regular numbers for ex.: "having count(*) < 5" and not whole fields for comparison.
The exercise is this:
MSSQL exercise
create table T(a int, b int);
insert into T values(1,2);
insert into T values(1,1);
insert into T values(2,3);
insert into T values(2,4);
insert into T values(3,4);
insert into T values(4,5);
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
I thought that the having count(*) was comparing each value that was grouped by to each value of T3.a in each row and if all rows have met the criteria than the value is getting selected but I somehow get different results.
Can someone please explain to me what is really going on behind this "having count(*) < T3.a" operation?
Thank you in advance.
To repeat myself from the comments, a HAVING is like a WHERE for aggregate functions. You cannot use aggregate function in the WHERE, for example WHERE SUM(SomeColumn) > 5, so you need to do them in the HAVING: HAVING SUM(SomeColumn) > 5. This would returns any rows where the SUM of the column SomeColumn is greater than 5 in the group.
For your expression, HAVING COUNT(*) < T3.a it would only return rows where the value of COUNT(*) is less than the value of T3.a.
Let's break this down to it's separate parts.
First the FROM
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
This uses the old-style deprecated cross-join syntax. It can be rewritten as a normal join:
from (select T1.a as a, T2.b as b
from T T1
join T T2 on T1.b < T2.a
) as T3
If we analyze what it does, we realize that it is actually what is known as a triangular join: every row is self-joined to every row lower than it. This was commonly done when window aggregates were not available.
WHERE
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
This is a correlated subquery: T3.a is a reference to the outer query.
What this predicate says is: for this particular row, there must be no rows in the subquery.
The subquery itself says: take all rows in T, group them by a and count, then only include rows for which the count is less than the outer reference a.
Note that because it is an EXIST, the actual selected value is not used. I suspect this may not have been the intention.
SELECT
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
We then take b from the first join, and the count from a subquery of all matching T rows. Again, this was common when window aggregates were not available.
So the whole thing can be rewritten as follows:
select T2.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (
select *, count(*) over (partition by a) as cnt
from T
) T1
join T T2 on T1.b < T2.a
where T1.cnt < T1.a;
There is something not quite right about the logic in your query, but without knowing what the original intention was, and without seeing the table and column names, I cannot say. The triangular join in particular looks very suspect.

How to divide to multiple column sql?

Based on the sql result above i want to divide the result like the image below
I tried using case it return duplicate data.
Anyone have done this or have any idea how to do this?
Can you try this one?
SELECT t1.*,t2.* from yourtable WHERE t1.hatch_num_1 != t2.hatch_num_1
JOIN yourtable t2 ON t1.delay_code_1=t2.delay_code_1
Afterwards you can mention exactly what columns you wan't from both t1 and t2 and mention with 'as' how do you wan't them to be named in your select statement, so instead of having 2 hatch_num_1 you wil have one with _1 and one with _3
;With
a As (SELECT * FROM yourtable X Where X.hatch= 'H1' ),
b AS (SELECT * FROM yourtable Y Where Y.hatch= 'H3')
SELECT A.* ,B.* FROM A , B WHERE A.[delay] = B.[delay]
If you have limited hatches and same time that are repeating then you can do it like this or show me some more records or details then i'll came to know...

concatenate columns from 2 tables in resultset

Here is simplified version of my schema. Using Sql Server 2012 enterprise edition.
CREATE table #abc (a INT , b INT);
CREATE TABLE #def ( a INT , c INT ,d INT);
INSERT INTO #abc values(1,23),(1,24);
INSERT INTO #def VALUES(1,53,54),(1,56,57)
Table #abc JOINs TO #def ON COLUMN a
Basically it is concatenation of rows from both tables based on column a. Tried inner join\cross apply but they all results in cross join kind of resultset understandably . I have workaround using another temp table(then update) but kind of feel that this can be done easily in single select . I am missing something simple here.
Need output like this:
a b c d
1 23 53 54
1 24 56 57
Thanks
-N
You need some sort of sequence number to join the tables together. You can generate one using row_number() as follows:
select a.a, a.b, d.c, d.d
from (select a.*, row_number() over (order by (select NULL)) as seqnum
from #abc a
) a join
(select d.*, row_number() over (order by (select NULL)) as seqnum
from #def d
) d
on a.seqnum = d.seqnum;
Now the caution, caution, caution. The order by clause does not really specify the ordering, so the sequence numbers may not be what you expect. You should really have a column to specify the ordering.
You need to have a unique key value in each row to be able to join the tables in the way you would like. Then, an inner join will return the result set you require.
If you introduce referential integrity between the tables, then this will be enforced and return the expected results.

OVER (ORDER BY Col) generates 'Sort' operation

I'm working on a query that needs to do filtering, ordering and paging according to the user's input. Now I'm testing a case that's really slow, upon inspection of the Query Plan a 'Sort' is taking 96% of the time.
The datamodel is really not that complicated, the following query should be clear enough to understand what's happening:
WITH OrderedRecords AS (
SELECT
A.Id
, A.col2
, ...
, B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
*
FROM OrderedRecords WHERE RowNumber Between x AND y
A is a table containing about 100k records, but will grow to tens of millions in the field, while B is category type table with 5 items (and this will never grow any bigger then perhaps a few more). There are clustered indexes on A.Id and B.Id.
Performance is really dreadful and I'm wondering if it's possible to remedy this somehow. If, for example, the ordering is on A.Id instead of B.col1 everything is pretty darn fast. Perhaps I can optimize B.col1 is some sort of index.
I already tried putting an index on the field itself, but this didn't help. Probably because the number of distinct items in table B is very small (in itself & compared to A).
Any ideas?
I think this may be part of the problem:
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.Id = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...)
Your LEFT JOIN is going to logically act like an INNER JOIN because of the WHERE clause you have in place, since only certain B.ID rows are going to be returned. If that's your intent, then go ahead and use an inner join, which may help the optimizer realize that you are looking for a restricted number of rows.
I suggest you to try following.
For the B table create index:
create index IX_B_1 on B (col1, Id, SomeThing)
For the A table create index:
create index IX_A_1 on A (col2, BId) include (Id, ...)
In the include put all other columns of the table A, that listed in SELECT of OrderedRecords CTE.
However, as you see, index IX_A_1 is space taking, and can take size of about table data itself.
So, as an alternative you may try omit extra columns from include part of the index:
create index IX_A_2 on A (col2, BId) include (Id)
but in this case you will have to slightly modify your query:
;WITH OrderedRecords AS (
SELECT
AId = A.Id
, A.col2
-- remove other A columns from here
, bid = B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
R.*, A.OtherColumns
FROM OrderedRecords R
join A on A.Id = R.AId
WHERE R.RowNumber Between x AND y

Resources