How is this SQL sub-query correctly vectorising? - sql-server

Sample data:
CREATE TABLE Departments (
Code INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
Budget decimal NOT NULL
);
CREATE TABLE Employees (
SSN INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
LastName varchar(255) NOT NULL ,
Department INTEGER NOT NULL ,
foreign key (department) references Departments(Code)
)
INSERT INTO Departments(Code,Name,Budget) VALUES(14,'IT',65000);
INSERT INTO Departments(Code,Name,Budget) VALUES(37,'Accounting',15000);
INSERT INTO Departments(Code,Name,Budget) VALUES(59,'Human Resources',240000);
INSERT INTO Departments(Code,Name,Budget) VALUES(77,'Research',55000);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('123234877','Michael','Rogers',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('152934485','Anand','Manikutty',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('222364883','Carol','Smith',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('326587417','Joe','Stevens',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332154719','Mary-Anne','Foster',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332569843','George','O''Donnell',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('546523478','John','Doe',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('631231482','David','Smith',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('654873219','Zacary','Efron',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('745685214','Eric','Goldsmith',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657245','Elizabeth','Doe',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657246','Kumar','Swamy',14);
Problem: "Select the names of departments with more than two employees."
Wikibooks solution:
/*With subquery*/
SELECT D.Name FROM Departments D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code
);
My question: How does this solution work? That is, how does MSSQL know which values in Departments are to be kept from the sub-query? I can't see any way that the condition WHERE Department = D.Code can return a result that is ordered in a useful way to the outer query. I don't think that this is a fluke, I think that I just don't understand how SQL is vectorised.

This is called a correlated subquery.
That is to say, the inner query is correlated to the outer one by use of an outer reference. In this case, that is D.Code. Therefore the subquery is being calculated for every row of D.
It's not a matter of ordering, in fact this query can return results in any order. But the result from the subquery must be greater than 2 otherwise the WHERE predicate fails.
SELECT D.Name FROM Departments D -- Departments has been aliased as D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code -- Here the inner query is being limited by
-- the reference to the outer D table
);
I would probably use ... > 2 rather than 2 < ...
Side point: It's better to always use an explicit table reference in subqueries, eg e.Department = D.Code, because otherwise you could misspell a column and end up referring to an outer column instead of an inner column, and the correlation wouldn't work properly

Related

I need to find multiple rows with iteration without using loop

Let's say I have 2 tables.
Users Table
and Have one more table which defines hierarchy of user.
hierarchy Table
So as you can see:
C is a supervisor of D
B is a supervisor of C
A is a supervisor of B
So when I pass User D, then it should return all the supervisor like A,B,C
same when I pass User C, then it should return all the supervisor like A,B
What I tried.
Create table Users
(
Id int primary key identity (1,1),
Name varchar(1),
)
Insert into Users values ('A')
Insert into Users values ('B')
Insert into Users values ('C')
Insert into Users values ('D')
Create table Hierarchy
(
Id int primary key identity (1,1),
EmployeeId int FOREIGN KEY REFERENCES Users(Id),
SupervisorId int FOREIGN KEY REFERENCES Users(Id)
)
Insert into Hierarchy values (4,3)
Insert into Hierarchy values (3,2)
Insert into Hierarchy values (2,1)
select * from Users
select * from Hierarchy
with HierarchyData as
(
select mbh.* from Hierarchy mbh where mbh.EmployeeId = 4
union all
select mbh.* from Hierarchy mbh
join Hierarchy on mbh.SupervisorId = Hierarchy.EmployeeId
where mbh.EmployeeId <> 4
)
select e.Name as EmpName, s.Name as SupervisorName from HierarchyData h
join Users e on h.EmployeeId = e.Id
join Users s on h.SupervisorId = s.Id
But I am getting only one level data.
Any kind of help would be appreciated.
#Vishal as per my understanding written query for you can you please check it's working or not?
here I used LEFT JOIN you can go with INNER JOIN
If you go with LEFT JOIN as per your example A not have any supervisor so the record can be empty.
If you go with INNER JOIN as per your example you got only the B, C, D record.
Please check the below test query.
DECLARE #User TABLE
(
UserID INT,
UserName NVARCHAR(50)
)
DECLARE #EmployeeTable TABLE
(
ID INT,
EmployeeID INT,
supervisorID INT
)
INSERT INTO #User VALUES(1,'A'),
(2,'B'),
(3,'C'),
(4,'D')
INSERT INTO #EmployeeTable VALUES
(1,4,3),
(2,3,2),
(3,2,1)
SELECT [U].[UserName] [EmployeeName],
[ET].[EmployeeID],
[ET].[SupervisorID],[ST].[SupervisorName]
FROM #User [U]
LEFT JOIN #EmployeeTable [ET]
ON [U].[UserID] = [ET].[EmployeeID]
LEFT JOIN
(
SELECT [U].UserName [SupervisorName] ,[ST].* FROM #User [U]
INNER JOIN #EmployeeTable [ST]
ON [ST].[supervisorID] = [U].[UserID]
) [ST]
ON [ST].[supervisorID] = [ET].[supervisorID]
Left join query result
Inner join query result
let me know if I can help more :).

Select all Main table rows with detail table column constraints with GROUP BY

I've 2 tables tblMain and tblDetail on SQL Server that are linked with tblMain.id=tblDetail.OrderID for orders usage. I've not found exactly the same situation in StackOverflow.
Here below is the sample table design:
/* create and populate tblMain: */
CREATE TABLE tblMain (
ID int IDENTITY(1,1) NOT NULL,
DateOrder datetime NULL,
CONSTRAINT PK_tblMain PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblMain (DateOrder) VALUES('2021-05-20T12:12:10');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-21T09:13:13');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-22T21:30:28');
GO
/* create and populate tblDetail: */
CREATE TABLE tblDetail (
ID int IDENTITY(1,1) NOT NULL,
OrderID int NULL,
Gencod VARCHAR(255),
Quantity float,
Price float,
CONSTRAINT PK_tblDetail PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '1234567890123', 8, 12.30);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '5825867890321', 2, 2.88);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '7788997890333', 1, 1.77);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9882254656215', 3, 5.66);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9665464654654', 4, 10.64);
GO
Here is my SELECT with grouping:
SELECT tblMain.id,SUM(tblDetail.Quantity*tblDetail.Price) AS TotalPrice
FROM tblMain LEFT JOIN tblDetail ON tblMain.id=tblDetail.orderid
WHERE (tblDetail.Quantity<>0) GROUP BY tblMain.id;
GO
This gives:
The wished output:
We see that id=2 is not shown even with LEFT JOIN, as there is no records with OrderID=2 in tblDetail.
How to design a new query to show tblMain.id = 2? Mean while I must keep WHERE (tblDetail.Quantity<>0) constraints. Many thanks.
EDIT:
The above query serves as CTE (Common Table Expression) for a main query that takes into account payments table tblPayments again.
After testing, both solutions work.
In my case, the main table has 15K records, while detail table has some millions. With (tblDetail.Quantity<>0 OR tblDetail.Quantity IS NULL) AND tblDetail.IsActive=1 added on JOIN ON clause it takes 37s to run, while the first solution of #pwilcox, the condition being added on the where clause, it ends up on 29s. So a gain of time of 20%.
tblDetail.IsActive column permits me ignore detail rows that is temporarily ignored by setting it to false.
So the for me it's ( #pwilcox's answer).
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)
Change
WHERE (tblDetail.Quantity<>0)
to
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)
as the former will omit id = 2 because the corresponding quantity would be null in a left join.
And as HABO mentions, you can also make the condition a part of your join logic as opposed to your where statement, avoiding the need for the 'or' condition.
select m.id,
totalPrice = sum(d.quantity * d.price)
from tblMain m
left join tblDetail d
on m.id = d.orderid
and d.quantity <> 0
group by m.id;

Show all and only rows in table 1 not in table 2 (using multiple columns)

I have one table (Table1) that has several columns used in combination: Name, TestName, DevName, Dept. When each of these 4 columns have values, the record is inserted into Table2. I need to confirm that all of the records with existing values in each of these fields within Table1 were correctly copied into Table 2.
I have created a query for it:
SELECT DISTINCT wr.Name,wr.TestName, wr.DEVName ,wr.Dept
FROM table2 wr
where NOT EXISTS (
SELECT NULL
FROM TABLE1 ym
WHERE ym.Name = wr.Name
AND ym.TestName = wr. TestName
AND ym.DEVName = wr.DEVName
AND ym. Dept = wr. Dept
)
My counts are not adding up, so I believe that this is incorrect. Can you advise me on the best way to write this query for my needs?
You can use the EXCEPT set operator for this one if the table definitions are identical.
SELECT DISTINCT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM table1 ym
EXCEPT
SELECT DISTINCT wr.Name, wr.TestName, wr.DEVName, wr.Dept
FROM table2 wr
This returns distinct rows from the first table where there is not a match in the second table. Read more about EXCEPT and INTERSECT here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017
Your query should do the job. It checks anything that are in Table1, but not Table2
SELECT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM Table1 ym
WHERE NOT EXISTS (
SELECT 1
FROM table2
WHERE ym.Name = Name AND ym.TestName = TestName AND ym.DEVName = DEVName AND ym. Dept = Dept
)
If the structure of both tables are the same, EXCEPT is probably simpler.
IF OBJECT_ID(N'tempdb..#table1') IS NOT NULL drop table #table1
IF OBJECT_ID(N'tempdb..#table2') IS NOT NULL drop table #table2
create table #table1 (id int, value varchar(10))
create table #table2 (id int)
insert into #table1(id, value) VALUES (1,'value1'), (2,'value2'), (3,'value3')
--test here. Comment next line
insert into #table2(id) VALUES (1) --Comment/Uncomment
select * from #table1
select * from #table2
select #table1.*
from #table1
left JOIN #table2 on
#table1.id = #table2.id
where (#table2.id is not null or not exists (select * from #table2))

Best approach to merge three distinct TSQL select statments

I'm looking to return one tsql statement, that contains four fields, from three separate, unrelated tables.
One table contains a list of objects, say fruits, and for each fruit, I want a sell buy date and best before date.
First statement would therefore look something like:
select fruit from fruit table -- this returns multiple rows
Second statement would look something like:
select sellbuyDate from sellTable -- this returns a single row
and the third would look something like:
select bestbefore from bestTable -- this returns a single row
Don't get to hung up on the table names. I'm working on a legacy system, that we cant change, so need to combine the three table into one.
The underlining table needs to have all the fields returned in a single row, with the second and third results applied to the first statement.
Apples | 12-12-2008 | 12-12-2009
Pears | 12-12-2008 | 12-12-2009
I've implemented the following temp table:
CREATE TABLE #Fruits
(
Fruit VARCHAR(100),
SellBuyDate DATETIME,
BestBefore DATETIME
)
INSERT INTO #Fruits
SELECT Fruit from fruit
INSERT INTO #Fruits
SELECT sellbuyDate from sellTable
INSERT INTO #Fruits
SELECT bestbefore from bestable
SELECT * from #Fruits
This throws an error, because each insert doesn't contain the three fields specified.
any other suggestions would be well received.
You can select them all together by doing a CROSS JOIN by not specifying any join criteria between the three tables as follows:
CREATE TABLE fruit ( fruit_name VARCHAR(30) );
CREATE TABLE sellTable ( sellByDate DATETIME );
CREATE TABLE bestTable ( bestBefore DATETIME );
CREATE TABLE allFruits
(
fruit_name VARCHAR(30),
sellByDate DATETIME,
bestBefore DATETIME
);
INSERT INTO fruit (fruit_name)
VALUES ('apple'), ('pear');
INSERT INTO sellTable(sellByDate)
VALUES ('12/05/2012');
INSERT INTO bestTable(bestBefore)
VALUES ('12/12/2012');
INSERT INTO allFruits (fruit_name, bestBefore, sellByDate)
SELECT f.fruit_name, b.bestBefore, s.sellByDate
FROM fruit f, bestTable b, sellTable s;
SELECT *
FROM allFruits;
maybe you are looking for this answer, although your question could be a little clearer on what that would accomplish
INSERT INTO #Fruits(fruit)
SELECT Fruit from fruit
INSERT INTO #Fruits(sellbuyDate)
SELECT sellbuyDate from sellTable
INSERT INTO #Fruits(bestbefore)
SELECT bestbefore from bestable
SELECT * from #Fruits
the other possible solution is
insert into #Fruits
select Fruit, sellbuyDate, bestbefore from fruit
cross join sellTable
cross join bestable
May be you need this.
SELECT FRUIT,
(SELECT SELLBUYDATE FROM SELLTABLE) AS SELLBUYDATE,
(SELECT BESTBEFORE FROM BESTTABLE) AS BESTBEFORE
FROM FRUIT
or
SELECT FRUIT.FRUIT
, SELLTABLE.SELLBUYDATE
, BESTTABLE.BESTBEFORE
FROM FRUIT
INNER JOIN SELLTABLE ON 1=1
INNER JOIN BESTTABLE ON 1=1
Without the schema of the 3 tables. I would guess its one of the following.
CREATE TABLE #Fruits
(
Fruit VARCHAR(100),
SellBuyDate DATETIME,
BestBefore DATETIME
)
INSERT INTO #Fruits(Fruit,SellBuyDate,BestBefore)
Select Fruit, Sellbuydate,bestbefore
from fruit,selltable,bestable
Or if your tables are set up properly
INSERT INTO #Fruits(Fruit,SellBuyDate,BestBefore)
Select Fruit, Sellbuydate,bestbefore
from fruit f
inner join selltable s
on f.pkey = s.fkey
inner join bestable b
on f.pkey = b.fkey

SQL Server 2008: Unique constraint for values non-related with columns

I have a simple problem. How can I add a unique constraint for a table, without relating the values to their columns? For example, I have this table
ID_A ID_B
----------
1 2
... ...
In that example, I have the record (1,2). For me, (1,2) = (2,1). So i don't want to allow my database to store both values. I know I can accomplish it using, triggers or checks and functions. But i was wondering if there is any instruccion like
CREATE UNIQUE CONSTRAINT AS A SET_CONSTRAINT
You could write a view like that:
select 1 as Dummy
from T t1
join T t2 on t1.ID1 = t2.ID2 AND t1.ID2 = t2.ID1 --join to corresponding row
cross join TwoRows
And create a unique index on Dummy. TwoRows is a table that contains two rows with arbitrary contents. It is supposed to make the unique index fail if there ever is a row in it. Any row in this view indicates a uniqueness violation.
You can do this using Instead of Insert trigger.
Demo
Table Schema
CREATE TABLE te(ID_A INT,ID_B INT)
INSERT te VALUES ( 1,2)
Trigger
Go
CREATE TRIGGER trg_name
ON te
instead OF INSERT
AS
BEGIN
IF EXISTS (SELECT 1
FROM inserted a
WHERE EXISTS (SELECT 1
FROM te b
WHERE ( ( a.id_a = b.id_b
AND a.id_b = b.id_a )
OR ( a.id_a = b.id_a
AND a.id_b = b.id_b ) )))
BEGIN
PRINT 'duplciate record'
ROLLBACK
END
ELSE
INSERT INTO te
SELECT Id_a,id_b
FROM inserted
END
SELECT * FROM te
Insert Script
INSERT INTO te VALUES (2,1) -- Duplicate
INSERT INTO te VALUES (1,2) --Duplicate
INSERT INTO te VALUES (3,2) --Will work

Resources