Whats the difference between UNION and CROSS JOIN? - sql-server

I have been reading about this two possibilities but I'm not sure if I understood them properly.
So UNION makes new rows with the union of the two queries data:
Table1 Table2
------ ------
1 3
2 4
SELECT * FROM TABLE1
UNION
SELECT * FROM TABLE2
Column1
---------
1
2
3
4
...
And CROSS JOIN makes Cartesian product of both tables:
SELECT * FROM TABLE1
CROSS JOIN TABLE 2
Column1 | Column2
-----------------
1 | 3
2 | 3
1 | 4
2 | 4
Its that ok?
Thanks for all.

The UNION operator is used to combine the result-set of two or more SELECT statements.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
If there is no relationship between the tables then it leads to cross join.

Others have already answered this question, but I just wanted to highlight that your example of the CROSS JOIN should return 4 and not 2 records.
Cross joins return every combination of records from the left of the join against the right.
Example Data
/* Table variables are a great way of sharing
* test data.
*/
DECLARE #T1 TABLE
(
Column11 INT
)
;
DECLARE #T2 TABLE
(
Column21 INT
)
;
INSERT INTO #T1
(
Column11
)
VALUES
(1),
(2)
;
INSERT INTO #T2
(
Column21
)
VALUES
(3),
(4)
;
UNION Query
/* UNION appends one recordset to the end of another,
* and then deduplicates the result.
*/
SELECT
*
FROM
#T1
UNION
SELECT
*
FROM
#T2
;
Returns
Column11
1
2
3
4
CROSS JOIN Query
/* CROSS JOIN returns every combination of table 1
* and table 2.
* The second recordset is appended to the right of the first.
*/
SELECT
*
FROM
#T1
CROSS JOIN #T2
;
Returns
Column11 Column21
1 3
2 3
1 4
2 4
Also important to note that Union will need exact same number of columns for both table while cross will not.

It is worth mentioning that unlike joins, UNION and CROSS-JOIN return all information about two tables and do not eliminate anything. But joins, inner and left/right joins, eliminate some part of the data which is not shared.

Related

SQL Server skip duplicate relations (parent child) in recursion

I have a tree, where specific node in tree can appear in another node in tree. (2 in my example):
1
/ \
2 3
/ \ \
4 5 6
\
2
/ \
4 5
Notice 2 is duplicated. First under 1, and second under 6.
My recursion is:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
)
select * from cte;
But the problem is that in result relation 2-4 and 2-5 is duplicated (first from relation 1-2 and second from relation 6-2):
ParentId ChildId OtherFields
1 2
1 3
2 4 /*from 1-2*/
2 5 /*from 1-2*/
3 6
6 2
2 4 /*from 6-2*/
2 5 /*from 6-2*/
Is there any way, to skip visiting duplicated relationships? I do no see any logic why should recursion run over rows that are already in result. It would be faster. Something like that:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
------> WHERE (select count(*) FROM SoFarCollectedResult WHERE ParentId=BOM.ParentId AND ChildId=BOM.ChildId ) = 0
)
select * from cte;
I found this thread, but it is 8 years old.
I am using SQL server 2016.
If this is not possible, then my question is how can I remove duplicates from final result, but check distinct only on ParentId and ChildId columns?
Edited:
Expected result is:
ParentId ChildId OtherFields
1 2
1 3
2 4
2 5
3 6
6 2
You can, with adding to 2 little tricks to the SQL.
But you need an extra Id column with a sequential number.
For example via an identity, or a datetime field that shows when the record was inserted.
For the simple reason that as far the database is concerned, there is no order in the records as they were inserted, unless you got a column that indicates that order.
Trick 1) Join the CTE record only to Id's that are higher. Because if they were lower then those are the duplicates you don't want to join.
Trick 2) Use the window function Row_number to get only those that are nearest to the Id the recursion started from
Example:
declare #BillOfMaterials table (Id int identity(1,1) primary key, ParentId int, ChildId int, Field1 varchar(8), Field2 varchar(8));
insert into #BillOfMaterials (ParentId, ChildId, Field1, Field2) values
(1,2,'A','1-2'),
(1,3,'B','1-3'),
(2,4,'C','2-4'), -- from 1-2
(2,5,'D','2-5'), -- from 1-2
(3,6,'E','3-6'),
(6,2,'F','6-2'),
(2,4,'G','2-4'), -- from 6-2
(2,5,'H','2-5'); -- from 6-2
;with cte AS
(
select Id as BaseId, 0 as Level, BOM.*
from #BillOfMaterials BOM
WHERE ParentId in (1)
UNION ALL
SELECT CTE.BaseId, CTE.Level + 1, BOM.*
FROM cte
JOIN #BillOfMaterials BOM on (BOM.ParentId = cte.ChildId and BOM.Id > CTE.Id)
)
select ParentId, ChildId, Field1, Field2
from (
select *
--, row_number() over (partition by BaseId, ParentId, ChildId order by Id) as RNbase
, row_number() over (partition by ParentId, ChildId order by Id) as RN
from cte
) q
where RN = 1
order by ParentId, ChildId;
Result:
ParentId ChildId Field1 Field2
-------- ------- ------ ------
1 2 A 1-2
1 3 B 1-3
2 4 C 2-4
2 5 D 2-5
3 6 E 3-6
6 2 F 6-2
Anyway, as a sidenote, normally a Parent-Child relation table is used differently.
More often it's just a table with unique Parent-Child combinations that are foreign keys to another table where that Id is a primary key. So that the other fields are kept in that other table.
Change your last query from:
select * from cte;
To:
select * from cte group by ParentId, ChildId;
This will essentially take what you have right now, but go one step further and remove rows that have already appeared, which would take care of your duplicate problem. Just be sure that all * returns here is ParentId and ChildId, should it be returning other columns you will need to either add them to the GROUP BY or apply some sort of aggregator to it so that it can still group (max, min, count...).
Should you have more rows that you can't aggregate or group on, you could write the query as such:
select * from cte where ID in (select MAX(ID) from cte group by ParentId, ChildId);
Where ID would be your primary table id for cte. This would take the maximum id when rows matched, which would normally be your latest entry, if you want the earliest entry just change MAX() to MIN().

T-SQL: Remove 1 row out of 2 rows where the value of 1 column is double that of the second

Given the following 2 rows of data:-
ColumnA ColumnB ColumnC ColumnD
33 10298 11588 4474.32
33 10298 11588 2237.16
How do I go about writing a T-SQL query which will remove only the first data row where ColumnsA - C are the same and the value in ColumnD is double that of the second data row.
It doesn't have to particularly performant as I am only removing approximately 500 rows.
Something along these lines should work:
DELETE FROM t2
FROM table t1
inner join
table t2
on
t1.ColumnA = t2.ColumnA and
t1.ColumnB = t2.ColumnB and
t1.ColumnC = t2.ColumnC and
t1.ColumnD * 2 = t2.ColumnD
This assumes that if you have 3 rows where their ratios between columnD values are 1 : 2 : 4, you want to delete both the 2 and 4 rows. If that's not the case, please consider such a situation and let me know what should happen there.
DELETE documentation
Complete script:
create table T (A int,B int, C int, D int)
insert into T(A,B,C,D)
values (1,2,3,4),(1,2,3,8)
delete from t2
from t t1
inner join
t t2
on
t1.A = t2.A and t1.B = t2.B and t1.C = t2.C and t1.D * 2 = t2.D
select * from T
Result:
A B C D
----------- ----------- ----------- -----------
1 2 3 4
Try this solution:
delete from YourTable
from YourTable t1
where exists (select 1 from YourTable t2 where t1.ColumnA=t2.ColumnA and t1.ColumnB=t2.ColumnB and t1.ColumnC=t2.ColumnC and t1.ColumnD=t2.ColumnD*2)
You can't use the same table two times in a join statement, if You want delete from that table. So use istead the join an exists statement or join a derived table.

Convert varchar to int, if error then return original value?

For example, I have 2 tables like:
Table1:
Column1
0001
0002
000a
Table2:
Column2
0001
0002
000a
Both Column1 and Column2 have data type is varchar(10).
I have to join 2 table together, so my query should be
Select * from Table1 join Table2 on Table1.Column1 = Table2.Column2
But, as we know that join 2 table in varchar type is much slower than join 2 table in numeric type (my tables have millions rows). I think about try
Select * from Table1 join Table2 on Cast(Table1.Column1 as int) =
Cast(Table2.Column2 as int)
Normally, it work fine and much faster. But If I got exception as in row 3 (000a), my query will be broke. So, I want to find a query like:
Select * from Table1 join Table2 on
try
Cast(Table1.Column1 as int) = Cast(Table2.Column2 as int)
catch if exception then
Table1.Column1 = Table2.Column2
Update:------------------------------------------------------------------------------------
I have an ideal:
Firstly, use try_cast to select any row that is numeric data:
select * from Table1 t1
join Table2 t2
on try_cast(t1.Column1 as bigint) =try_cast(t2.Column2 as bigint)
After that, select any row that if try_cast will become null (exception row):
select * from
(select * from Table1 t1 where try_cast(t1.Column1 as bigint) is NULL and
t1.Column1 is not NULL) as table1
join
(select * from Table2 t2 where try_cast(t2.Column2 as bigint) is NULL and
t2.Column2 is not NULL) as table2
on table1.Column1=table2.Column2
And finally union all 2 result together and I will get what I want. I take a test and it is pretty fast. Are there anything wrong with my ideal or something I forgot then please let me know!
If you want to join using integers, under some coditions there is fast approach, but you must know your data: convert your values to numbers. The idea of getting the numbers works like this:
SELECT Column1,
ASCII(SUBSTRING(c1, 1, 1)) * 256 * 256 * 256
+ ASCII(SUBSTRING(c1, 2, 1)) * 256 * 256
+ ASCII(SUBSTRING(c1, 3, 1)) * 256
+ ASCII(SUBSTRING(c1, 4, 1)) AS Column1Num
FROM Table1
result:
Column1 Column1Num
---------- -----------
0001 808464433
0002 808464434
000a 808464481
This sticks with your example of having all values 4 characters long. The key is to find expression which creates unique number for every unique value and which is still applicable to all your records. Then use this expression as key for your JOIN.
if not all values are of same length, you can compensate it (e.g. by SPACE() function)
if your digits are actually hexadecimal (0123456789abcdef) you can decrease the quotient from 256 to 16 after you convert every digit to value 0..15 before applying the quotient
this is just to show you alternative approach - knowing your data can show you optimum calculation formula but also can show that no such calculation is possible
Main tip: evaluate performance of all approaches ([1] joining varchars, [2] approach from your question update, [3] approach suggested in this answer) and pick the fastest

SQL Server - Finding bitwise OR values using query

I have a business requirement to search through a database table where one of the columns is a bitwise integer and remove the base values from a table.
For example, assume my table/resultset looks like this currently:
Value
-----
1
2
16
32
33
Notice that 33 is present, which is also 32 | 1. What I need to do is remove the 1 and 32 values to return this:
Value
-----
2
16
33
Obviously I could do this with looping constructs in SQL - or even in my business logic in C# - but I'm wondering whether this is at all possible using a query?
Here's a query that ought to work.
DELETE myTable WHERE myTable.Value in
(SELECT T1.Value
FROM myTable T1
CROSS JOIN myTable T2
WHERE T1.Value<>T2.Value AND T1.Value<T2.Value AND ((T1.Value | T2.VALUE)=T2.Value))
Try:
with cte(Value) as (
select 1 as Value
union all
select 2
union all
select 16
union all
select 32
union all
select 33
)
--select values to be removed
select x1.Value
from cte x1
inner join cte x2 on x1.Value <> x2.Value
inner join cte x3 on x1.Value <> x3.Value and x2.Value <> x3.Value
where x1.Value | x2.Value = x3.Value

SQL set operation with different number of columns in each set

Let say I have set 1:
1 30 60
2 45 90
3 120 240
4 30 60
5 20 40
and set 2
30 60
20 40
I would like to do some sort of union where I only keep rows 1,4,5 from set 1 because the latter 2 columns of set 1 can be found in set 2.
My problem is that set based operations insist on the same numnber of columns.
I've thought of concatenating the columns contents, but it feels dirty to me.
Is there a 'proper' way to accomplish this?
I'm on SQL Server 2008 R2
In the end, I would like to end up with
1 30 60
4 30 60
5 20 40
CLEARLY I need to go sleep as a simple join on 2 columns worked.... Thanks!
You are literally asking for
give me the rows in t1 where the 2 columns match in T2
So if the output is only rows 1, 4 and 5 from table 1 then it is a set based operation and can be done with EXISTS or INTERSECT or JOIN. For the "same number of column", then you simply set 2 conditions with an AND. This is evaluated per row
EXISTS is the most portable and compatible way and allows any column from table1
select id, val1, val2
from table1 t1
WHERE EXISTS (SELECT * FROM table2 t2
WHERE t1.val1 = t2.val1 AND t1.val2 = t2.val2)
INTERSECT requires the same columns in each clause and not all engines support this (SQL Server does since 2005+)
select val1, val2
from table1
INTERSECT
select val1, val2
from table2
With an INNER JOIN, if you have duplicate values for val1, val2 in table2 then you'll get more rows than expected. The internals of this usually makes it slower then EXISTS
select t1.id, t1.val1, t1.val2
from table1 t1
JOIN
table2 t2 ON t1.val1 = t2.val1 AND t1.val2 = t2.val2
Some RBDMS support IN on multiple columns: this isn't portable and SQL Server doesn't support it
Edit: some background
Relationally, it's a semi-join (One, Two).
SQL Server does it as a "left semi join"
INTERSECT and EXISTS in SQL Server usually give the same execution plan. The join type is a "left semi join" whereas INNER JOIN is a full "equi-join".
You could use union which, as opposed to union all, eliminates duplicates:
select val1, val2
from table1
union
select val1, val2
from table1
EDIT: Based on your edited question, you can exclude rows that match the second table using a not exists subquery:
select id, col1, col2
from table1 t1
where not exists
(
select *
from table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
)
union all
select null, col1, col2
from table2
If you'd like to exclude rows from table2, omit union all and everything below it.

Resources