CROSS JOIN with one empty table sql server - sql-server

I have two tables say MyFull and MyEmpty. I need to do the combination of all records from both tables. Sometimes, MyEmpty table might have no records, in this scenario I need to return all records from MyFull. This is what I have tried:
--Q1 >> returns no results
SELECT *
FROM MyFull CROSS JOIN MyEmpty
--Q2 >> returns no results
SELECT *
FROM MyFull, MyEmpty
I though of using LEFT JOIN but I have no common key to join on. This is a SQLFiddle

Try this:
SELECT *
FROM MyFull
LEFT JOIN MyEmpty ON 1=1
Though be aware that if you have multiple records in MyEmpty, it will repeat records from MyFull once for every record in MyEmpty. Effectively, the number of records in the results is MyFull * MyEmpty (unless MyEmpty has no records).

Full table :
declare #t table (Spot int,name varchar(1),pct int)
insert into #t(Spot,name,pct)values(1,'A',2),(1,'B',8),(1,'C',6),(2,'A',4),(2,'B',5),(3,'A',5),(3,'D',1),(3,'E',4)
Empty table :
declare #tt table (Spot int,name varchar(1),pct int)
select * from #t OUTER APPLY #tt

Related

How do you save a join as a Snowflake table?

I understand how a Join is made in Snowflake but I need to temporarily save it as a table, how can it be possible make it using consola?
(For this project I can't do it with any connectors)
You are looking for the CREATE TABLE AS SELECT or CTAS pattern:
CREATE TABLE tmp_table AS
SELECT a.*, b.*
FROM (SELECT * FROM VALUES ('a'),('b')) a(string)
CROSS JOIN (SELECT * FROM VALUES (1),(2)) b(vals);
SELECT * FROM tmp_table;
gives:
STRING
VALS
a
1
a
2
b
1
b
2
and if you want it to be temporary for the session you might want to add TEMP
CREATE TEMP TABLE tmp_table AS

SQL Server : DELETE FROM table FROM table

I keep coming across this DELETE FROM FROM syntax in SQL Server, and having to remind myself what it does.
DELETE FROM tbl
FROM #tbl
INNER JOIN tbl ON fk = pk AND DATEDIFF(day, #tbl.date, tbl.Date) = 0
EDIT: To make most of the comments and suggested answers make sense, the original question had this query:
DELETE FROM tbl
FROM tbl2
As far as I understand, you would use a structure like this where you are restricting which rows to delete from the first table based on the results of the from query. But to do that you need to have a correlation between the two.
In your example there is no correlation, which will effectively be a type of cross join which means "for every row in tbl2, delete every row in tbl1". In other words it will delete every row in the first table.
Here is an example:
declare #t1 table(A int, B int)
insert #t1 values (15, 9)
,(30, 10)
,(60, 11)
,(70, 12)
,(80, 13)
,(90, 15)
declare #t2 table(A int, B int)
insert #t2 values (15, 9)
,(30, 10)
,(60, 11)
delete from #t1 from #t2
The result is an empty #t1.
On the other hand this would delete just the matching rows:
delete from #t1 from #t2 t2 join #t1 t1 on t1.A=t2.A
I haven't seen this anywhere before. The documentation of DELETE tells us:
FROM table_source Specifies an additional FROM clause. This
Transact-SQL extension to DELETE allows specifying data from
and deleting the corresponding rows from the table in
the first FROM clause.
This extension, specifying a join, can be used instead of a subquery
in the WHERE clause to identify rows to be removed.
Later in the same document we find
D. Using joins and subqueries to data in one table to delete rows in
another table The following examples show two ways to delete rows in
one table based on data in another table. In both examples, rows from
the SalesPersonQuotaHistory table in the AdventureWorks2012 database
are deleted based on the year-to-date sales stored in the SalesPerson
table. The first DELETE statement shows the ISO-compatible subquery
solution, and the second DELETE statement shows the Transact-SQL FROM
extension to join the two tables.
With these examples to demonstrate the difference
-- SQL-2003 Standard subquery
DELETE FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID IN
(SELECT BusinessEntityID
FROM Sales.SalesPerson
WHERE SalesYTD > 2500000.00);
-- Transact-SQL extension
DELETE FROM Sales.SalesPersonQuotaHistory
FROM Sales.SalesPersonQuotaHistory AS spqh
INNER JOIN Sales.SalesPerson AS sp
ON spqh.BusinessEntityID = sp.BusinessEntityID
WHERE sp.SalesYTD > 2500000.00;
The second FROM mentions the same table in this case. This is a weird way to get something similar to an updatable cte or a derived table
In the third sample in section D the documentation states clearly
-- No need to mention target table more than once.
DELETE spqh
FROM
Sales.SalesPersonQuotaHistory AS spqh
INNER JOIN Sales.SalesPerson AS sp
ON spqh.BusinessEntityID = sp.BusinessEntityID
WHERE sp.SalesYTD > 2500000.00;
So I get the impression, the sole reason for this was to use the real table's name as the DELETE's target instead of an alias.

TSQL - subquery inside Begin End

Consider the following query:
begin
;with
t1 as (
select top(10) x from tableX
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- --------------------------
select *
from t2
join t3 on t3.x=t2.x
end
go
I was wondering if t1 is called twice hence tableX being called twice (which means t1 acts like a table)?
or just once with its rows saved in t1 for the whole query (like a variable in a programming lang)?
Just trying to figure out how tsql engine optimises this. This is important to know because if t1 has millions of rows and is being called many times in the whole query generating the same result then there should be a better way to do it..
Just create the table:
CREATE TABLE tableX
(
x int PRIMARY KEY
);
INSERT INTO tableX
VALUES (1)
,(2)
Turn on the execution plan generation and execute the query. You will get something like this:
So, yes, the table is queried two times. If you are using complex common table expression and you are working with huge amount of data, I will advice to store the result in temporary table.
Sometimes, I am getting very bad execution plans for complex CTEs which were working nicely in the past. Also, you are allowed to define indexes on temporary tables and improve performance further.
To be honest, there is no answer... The only answer is Race your horses (Eric Lippert).
The way you write your query does not tell you, how the engine will put it in execution. This depends on many, many influences...
You tell the engine, what you want to get and the engine decides how to get this.
This may even differ between identical calls depending on statistics, currently running queries, existing cached results etc.
Just as a hint, try this:
USE master;
GO
CREATE DATABASE testDB;
GO
USE testDB;
GO
--I create a physical test table with 1.000.000 rows
CREATE TABLE testTbl(ID INT IDENTITY PRIMARY KEY, SomeValue VARCHAR(100));
WITH MioRows(Nr) AS (SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2 CROSS JOIN master..spt_values v3)
INSERT INTO testTbl(SomeValue)
SELECT CONCAT('Test',Nr)
FROM MioRows;
--Now we can start to test this
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Your approach with CTEs
;with t1 as (select * from testTbl)
,t2 as (select * from t1)
,t3 as (select * from t1)
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target1
from t2
join t3 on t3.ID=t2.ID;
SELECT 'Final CTE',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Writing the intermediate result into a physical table
SELECT * INTO test1 FROM testTbl;
SELECT 'Write into test1',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target2
from test1 t2
join test1 t3 on t3.ID=t2.ID
SELECT 'Final physical table',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Same as before, but with an primary key on the intermediate table
SELECT * INTO test2 FROM testTbl;
SELECT 'Write into test2',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
ALTER TABLE test2 ADD PRIMARY KEY (ID);
SELECT 'Add PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target3
from test2 t2
join test2 t3 on t3.ID=t2.ID
SELECT 'Final physical tabel with PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
--Clean up (Careful with real data!!!)
GO
USE master;
GO
--DROP DATABASE testDB;
GO
On my system the
first takes 674ms, the
second 1.205ms (297 for writing into test1) and the
third 1.727ms (285 for writing into test2 and ~650ms for creating the index.
Although the query is performed twice, the engine can take advantage of cached results.
Conclusio
The engine is really smart... Don't try to be smarter...
If the table would cover a lot of columns and much more data per row the whole test might return something else...
If your CTEs (sub-queries) involve much more complex data with joins, views, functions and so on, the engine might get into troubles finding the best approach.
If performance matters, you can race your horses to test it out. One hint: I sometimes used a TABLE HINT quite successfully: FORCE ORDER. This will perform joins in the order specified in the query.
Here is a simple example to test the theories:
First, via temporary table which calls the matter only once.
declare #r1 table (id int, v uniqueidentifier);
insert into #r1
SELECT * FROM
(
select id=1, NewId() as 'v' union
select id=2, NewId()
) t
-- -----------
begin
;with
t1 as (
select * from #r1
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- ----------------
select * from t2
union all select * from t3
end
go
On the other hand, if we put the matter inside t1 instead of the temporary table, it gets called twice.
t1 as (
select id=1, NewId() as 'v' union
select id=2, NewId()
)
Hence, my conclusion is to use temporary table and not reply on cached results.
Also, ive implemented this on a large scale query that called the "matter" twice only and after moving it to temporary table the execution time went straight half!!

2 nvarchar fields are not matching though the data is same?

I want to join 2 tables using an Inner Join on 2 columns, both are of (nvarchar, null) type. The 2 columns have the same data in them, but the join condition is failing. I think it is due to the spaces contained in the column values.
I have tried LTRIM, RTRIM also
My query:
select
T1.Name1, T2.Name2
from
Table1 T1
Inner Join
Table2 on T1.Name1 = T2.Name2
I have also tried like this:
on LTRIM(RTRIM(T1.Name1)) = LTRIM(RTRIM(T2.Name2))
My data:
Table1 Table2
------ ------
Name1(Column) Name2(Column)
----- ------
Employee Data Employee Data
Customer Data Customer Data
When I check My data in 2 tables with
select T1.Name1,len(T1.Name1)as Length1,Datalength(T1.Name1)as DataLenght1 from Table1 T1
select T2.Name2,len(T2.Name2)as Length2,Datalength(T2.Name2)as DataLenght2 from Table2 T2
The result is different Length and DataLength Values for the 2 tables,They are not same for 2 tables.
I can't change the original data in the 2 tables. How can I fix this issue.
Thank You
Joins do not have special rules for equality. The equality operator always works the same way. So if a = b then the join on a = b would work. Therefore, a <> b.
Check the contents of those fields. They will not be the same although you think they are:
select convert(varbinary(max), myCol) from T
Unicode has invisible characters (that only ever seem to cause trouble).
declare #t table (name varchar(20))
insert into #t(name)values ('Employee Data'),('Customer Data')
declare #tt table (name varchar(20))
insert into #tt(name)values ('EmployeeData'),('CustomerData')
select t.name,tt.name from #t t
INNER JOIN #tt tt
ON RTRIM(LTRIM(REPLACE(t.name,' ',''))) = RTRIM(LTRIM(REPLACE(tt.name,' ','')))
I would follow the following schema
Create a new table to store all the possible names
Add needed keys and indexes
Populate with existing names
Add columns to your existing tables to store the index of the name
Create relative foreign keys
Populate the new columns with correct indexes
Create procedure to perform an insert in new tables names only in case the value is not existing
Perform the join using the new indexes

SQL Server can you use EXCEPT or INTERSECT and ignore a column?

Here's my question,
CREATE TABLE
#table1(ID int, Fruit varchar(50), Veg varchar(50))
INSERT INTO #table1 (ID,Fruit,Veg)
VALUES (1,'Apple', 'Potato')
CREATE TABLE
#table2(ID int, Fruit varchar(50), Veg varchar(50))
INSERT INTO #table2 (ID,Fruit,Veg)
VALUES (2,'Apple', 'Potato')
SELECT * FROM #table1 INTERSECT SELECT * FROM #table2
I have two tables and I want to find rows which are the same in both, but both tables have different and unrelated ID columns. Is there any way to use INTERSECT or EXCEPT on two tables, but ignore the ID in the comparison?
I need to keep the ID's on the returned rows, so on the example above, two rows would be returned, one with ID = 1 and another with ID=2
If anything other than the ID's is different, then nothing would be returned.
Thanks!
I don't think this can be done with INTERSECT. Maybe with a join instead?
SELECT t1.id, t2.id, t1.veg, t1.fruit
FROM Table1 as t1
INNER JOIN Table2 as t2
ON t1.veg = t2.veg AND t1.fruit = t2.fruit

Resources