SQL set operation with different number of columns in each set

SQL set operation with different number of columns in each set - sql-server

Let say I have set 1:
1 30 60
2 45 90
3 120 240
4 30 60
5 20 40
and set 2
30 60
20 40
I would like to do some sort of union where I only keep rows 1,4,5 from set 1 because the latter 2 columns of set 1 can be found in set 2.
My problem is that set based operations insist on the same numnber of columns.
I've thought of concatenating the columns contents, but it feels dirty to me.
Is there a 'proper' way to accomplish this?
I'm on SQL Server 2008 R2
In the end, I would like to end up with
1 30 60
4 30 60
5 20 40
CLEARLY I need to go sleep as a simple join on 2 columns worked.... Thanks!

You are literally asking for
give me the rows in t1 where the 2 columns match in T2
So if the output is only rows 1, 4 and 5 from table 1 then it is a set based operation and can be done with EXISTS or INTERSECT or JOIN. For the "same number of column", then you simply set 2 conditions with an AND. This is evaluated per row
EXISTS is the most portable and compatible way and allows any column from table1
select id, val1, val2
from table1 t1
WHERE EXISTS (SELECT * FROM table2 t2
WHERE t1.val1 = t2.val1 AND t1.val2 = t2.val2)
INTERSECT requires the same columns in each clause and not all engines support this (SQL Server does since 2005+)
select val1, val2
from table1
INTERSECT
select val1, val2
from table2
With an INNER JOIN, if you have duplicate values for val1, val2 in table2 then you'll get more rows than expected. The internals of this usually makes it slower then EXISTS
select t1.id, t1.val1, t1.val2
from table1 t1
JOIN
table2 t2 ON t1.val1 = t2.val1 AND t1.val2 = t2.val2
Some RBDMS support IN on multiple columns: this isn't portable and SQL Server doesn't support it
Edit: some background
Relationally, it's a semi-join (One, Two).
SQL Server does it as a "left semi join"
INTERSECT and EXISTS in SQL Server usually give the same execution plan. The join type is a "left semi join" whereas INNER JOIN is a full "equi-join".

You could use union which, as opposed to union all, eliminates duplicates:
select val1, val2
from table1
union
select val1, val2
from table1
EDIT: Based on your edited question, you can exclude rows that match the second table using a not exists subquery:
select id, col1, col2
from table1 t1
where not exists
(
select *
from table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
)
union all
select null, col1, col2
from table2
If you'd like to exclude rows from table2, omit union all and everything below it.

Related

I'm attempting to self-join outer join based on dates to return numeric differences - MS SQL

I'm attempting to self join this table on the ID and Task_ID field, joining yesterday's date with today's date, so that I can return items that have increased or decreased between yesterday and today. I also want to return items that have dropped off from yesterday, or are new as of today. I think I want to do an outer self-join, if that makes sense.
I currently have this query working by joining two temp tables together, one with today's date and one with yesterday's date, and then subtracting yesterday's number from today's number, returning any nulls or non-zero difference values. However, I want to increase my SQL skills, but I keep running into problems with this query. Thank you in advance for your help!
My Data looks like this:
ID
task_id
number
date_stored
2
BH1
6
01/18/2021
2
BH1
5
01/19/2021
7
AK9
3
01/18/2021
7
KL8
2
01/19/2021
7
KL8
2
01/18/2021
I tried this, but it (I believe) implicitly assumed an inner join and excluded the dropped and new items:
SELECT
*
FROM
table t1, table t2
WHERE
t1.id = t2.id
and
t1.task_id = t2.task_id
and
t1.date_stored != t2.date_stored
Then I tried this:
SELECT
T1.*,
t2.number as today_no,
t1.number - t2.number as diff_btw
FROM
table t1 LEFT JOIN table as t2
on t1.ID = t2.ID
and t1.task_id = t2.task_id
and t1.date_stored <> t2.date_stored
WHERE
t1.date_stored = '2021-01-18'
and
t1.number - t2.number IS NULL
or
t1.number - t2.number <> 0

As #Larnu said, use LAG:
WITH cte As
(
SELECT
ID,
task_id,
date_stored,
number As today_no,
LAG(number) OVER (PARTITION BY ID, task_id ORDER BY date_stored) As yesterday_no
FROM
table
)
SELECT
ID,
task_id,
date_stored,
today_no,
yesterday_no,
today_no - yesterday_no As diff_btw
FROM
cte
WHERE
yesterday_no Is Null
Or
today_no != yesterday_no
ORDER BY
task_id,
date_stored
;
LAG (Transact-SQL) - SQL Server | Microsoft Docs

SQL query performance for huge data

I have a query:
SELECT c.somecolumn,p.someothercolumn
FROM table1 co
INNER JOIN table2 p(NOLOCK) ON co.COLUMN = p.COLUMN
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
WHERE co.filterColumn = 1
Table2 is a junction table and the join between table1 and table2 is on a column without distinct values (that’s the requirement and can't be changed) and hence there are cross joins.
Output of this query results in 180 million records.
Record count:
table 1: 2 190 561
table 2: 568 277
table 3: 300 150
How to optimize the above query? Execution plan:

Make sure you at least have indexes on the columns in the joins that include the columns you're returning (for example, in table2, you should have a non-clustered index that is keyed on "p.COLUMN" and includes "p.someothercolumn". For table 3, key on c.column11 and include c.somecolumn. You should have an index on table1.filtercolumn.
Consider also, that you have to return 180 million rows to the caller, that takes time. Try just inserting that data into a throwaway table just to keep the network load time out of your equation.

These could be ideally the indexes that are required:
For table1 - filtered index on COLUMN and column11 where co.filterColumn = 1
For table2 - Index on COLUMN include someothercolumn
For table3 - Index on column11 include somecolumn

SELECT c.somecolumn
,tmp.someothercolumn
FROM table1 co
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
AND co.filterColumn = 1
CROSS APPLY (SELECT TOP (1) p.SomeOtherColumn
FROM table2 p(NOLOCK)
WHERE p.COLUMN = co.Column) tmp

Whats the difference between UNION and CROSS JOIN?

I have been reading about this two possibilities but I'm not sure if I understood them properly.
So UNION makes new rows with the union of the two queries data:
Table1 Table2
------ ------
1 3
2 4
SELECT * FROM TABLE1
UNION
SELECT * FROM TABLE2
Column1
---------
1
2
3
4
...
And CROSS JOIN makes Cartesian product of both tables:
SELECT * FROM TABLE1
CROSS JOIN TABLE 2
Column1 | Column2
-----------------
1 | 3
2 | 3
1 | 4
2 | 4
Its that ok?
Thanks for all.

The UNION operator is used to combine the result-set of two or more SELECT statements.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
If there is no relationship between the tables then it leads to cross join.

Others have already answered this question, but I just wanted to highlight that your example of the CROSS JOIN should return 4 and not 2 records.
Cross joins return every combination of records from the left of the join against the right.
Example Data
/* Table variables are a great way of sharing
* test data.
*/
DECLARE #T1 TABLE
(
Column11 INT
)
;
DECLARE #T2 TABLE
(
Column21 INT
)
;
INSERT INTO #T1
(
Column11
)
VALUES
(1),
(2)
;
INSERT INTO #T2
(
Column21
)
VALUES
(3),
(4)
;
UNION Query
/* UNION appends one recordset to the end of another,
* and then deduplicates the result.
*/
SELECT
*
FROM
#T1
UNION
SELECT
*
FROM
#T2
;
Returns
Column11
1
2
3
4
CROSS JOIN Query
/* CROSS JOIN returns every combination of table 1
* and table 2.
* The second recordset is appended to the right of the first.
*/
SELECT
*
FROM
#T1
CROSS JOIN #T2
;
Returns
Column11 Column21
1 3
2 3
1 4
2 4
Also important to note that Union will need exact same number of columns for both table while cross will not.

It is worth mentioning that unlike joins, UNION and CROSS-JOIN return all information about two tables and do not eliminate anything. But joins, inner and left/right joins, eliminate some part of the data which is not shared.

T-SQL: Remove 1 row out of 2 rows where the value of 1 column is double that of the second

Given the following 2 rows of data:-
ColumnA ColumnB ColumnC ColumnD
33 10298 11588 4474.32
33 10298 11588 2237.16
How do I go about writing a T-SQL query which will remove only the first data row where ColumnsA - C are the same and the value in ColumnD is double that of the second data row.
It doesn't have to particularly performant as I am only removing approximately 500 rows.

Something along these lines should work:
DELETE FROM t2
FROM table t1
inner join
table t2
on
t1.ColumnA = t2.ColumnA and
t1.ColumnB = t2.ColumnB and
t1.ColumnC = t2.ColumnC and
t1.ColumnD * 2 = t2.ColumnD
This assumes that if you have 3 rows where their ratios between columnD values are 1 : 2 : 4, you want to delete both the 2 and 4 rows. If that's not the case, please consider such a situation and let me know what should happen there.
DELETE documentation
Complete script:
create table T (A int,B int, C int, D int)
insert into T(A,B,C,D)
values (1,2,3,4),(1,2,3,8)
delete from t2
from t t1
inner join
t t2
on
t1.A = t2.A and t1.B = t2.B and t1.C = t2.C and t1.D * 2 = t2.D
select * from T
Result:
A B C D
----------- ----------- ----------- -----------
1 2 3 4

Try this solution:
delete from YourTable
from YourTable t1
where exists (select 1 from YourTable t2 where t1.ColumnA=t2.ColumnA and t1.ColumnB=t2.ColumnB and t1.ColumnC=t2.ColumnC and t1.ColumnD=t2.ColumnD*2)
You can't use the same table two times in a join statement, if You want delete from that table. So use istead the join an exists statement or join a derived table.

SQL Server - Finding bitwise OR values using query

I have a business requirement to search through a database table where one of the columns is a bitwise integer and remove the base values from a table.
For example, assume my table/resultset looks like this currently:
Value
-----
1
2
16
32
33
Notice that 33 is present, which is also 32 | 1. What I need to do is remove the 1 and 32 values to return this:
Value
-----
2
16
33
Obviously I could do this with looping constructs in SQL - or even in my business logic in C# - but I'm wondering whether this is at all possible using a query?

Here's a query that ought to work.
DELETE myTable WHERE myTable.Value in
(SELECT T1.Value
FROM myTable T1
CROSS JOIN myTable T2
WHERE T1.Value<>T2.Value AND T1.Value<T2.Value AND ((T1.Value | T2.VALUE)=T2.Value))

Try:
with cte(Value) as (
select 1 as Value
union all
select 2
union all
select 16
union all
select 32
union all
select 33
)
--select values to be removed
select x1.Value
from cte x1
inner join cte x2 on x1.Value <> x2.Value
inner join cte x3 on x1.Value <> x3.Value and x2.Value <> x3.Value
where x1.Value | x2.Value = x3.Value

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL set operation with different number of columns in each set - sql-server

Related

I'm attempting to self-join outer join based on dates to return numeric differences - MS SQL

SQL query performance for huge data

Whats the difference between UNION and CROSS JOIN?

T-SQL: Remove 1 row out of 2 rows where the value of 1 column is double that of the second

SQL Server - Finding bitwise OR values using query

Categories

Resources