Cross apply vs CTE in Sql Server - sql-server

I have two tables
the first table is named: tblprovince
create table (
provinceid int not null primary key (1,1) ,
provinceNme nvarchar(max),
description nvarchar(max))
the second table is named tblcity:
create table tblcity(
cityid int identity (1,1),
CityName nvarchar(max),
population int,
provinceid int foreign key references tblprovince(provinceid)
);
I need to list all provinces that have at least two large cities. A large city is defined as having a population of at least one million residents. The query must return the following columns:
tblProvince.ProvinceId
tblProvince.ProvinceName
a derived column named LargeCityCount that presents the total count of large cities for the province
 
select p.provinceId, p.provincename, citysummary.LargeCityCount
from tblprovince p
cross apply (
select count(*) as LargeCityCount from tblcity c
where c.population >= 1000000 and c.provinceid=p.provinceid
) citysummary
where citysummary.LargeCityCount
Is this query correct?
Are there other methods that allow me to achieve my goal?

SELECT tp.provinceid, tp.provinceNme, COUNT(tc.cityid) AS largecitycount
FROM tblprovince tp INNER JOIN
tblcity tc ON tc.provinceid=tp.provinceid
WHERE tc.population>=1000000
GROUP BY tp.provinceid, tp.provinceNme
HAVING COUNT(tc.cityid)>1

Related

How is this SQL sub-query correctly vectorising?

Sample data:
CREATE TABLE Departments (
Code INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
Budget decimal NOT NULL
);
CREATE TABLE Employees (
SSN INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
LastName varchar(255) NOT NULL ,
Department INTEGER NOT NULL ,
foreign key (department) references Departments(Code)
)
INSERT INTO Departments(Code,Name,Budget) VALUES(14,'IT',65000);
INSERT INTO Departments(Code,Name,Budget) VALUES(37,'Accounting',15000);
INSERT INTO Departments(Code,Name,Budget) VALUES(59,'Human Resources',240000);
INSERT INTO Departments(Code,Name,Budget) VALUES(77,'Research',55000);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('123234877','Michael','Rogers',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('152934485','Anand','Manikutty',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('222364883','Carol','Smith',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('326587417','Joe','Stevens',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332154719','Mary-Anne','Foster',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332569843','George','O''Donnell',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('546523478','John','Doe',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('631231482','David','Smith',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('654873219','Zacary','Efron',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('745685214','Eric','Goldsmith',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657245','Elizabeth','Doe',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657246','Kumar','Swamy',14);
Problem: "Select the names of departments with more than two employees."
Wikibooks solution:
/*With subquery*/
SELECT D.Name FROM Departments D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code
);
My question: How does this solution work? That is, how does MSSQL know which values in Departments are to be kept from the sub-query? I can't see any way that the condition WHERE Department = D.Code can return a result that is ordered in a useful way to the outer query. I don't think that this is a fluke, I think that I just don't understand how SQL is vectorised.
This is called a correlated subquery.
That is to say, the inner query is correlated to the outer one by use of an outer reference. In this case, that is D.Code. Therefore the subquery is being calculated for every row of D.
It's not a matter of ordering, in fact this query can return results in any order. But the result from the subquery must be greater than 2 otherwise the WHERE predicate fails.
SELECT D.Name FROM Departments D -- Departments has been aliased as D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code -- Here the inner query is being limited by
-- the reference to the outer D table
);
I would probably use ... > 2 rather than 2 < ...
Side point: It's better to always use an explicit table reference in subqueries, eg e.Department = D.Code, because otherwise you could misspell a column and end up referring to an outer column instead of an inner column, and the correlation wouldn't work properly

Self Join on large tables slowness issue

I have two tables like...
table1 (cid, duedate, currency, value)
main_table1 (cid)
My query is like below, I am find out co-relation between each cid and table1 contains 3 million records(cid and duedate column is compositely unique) and main_table contains 1500 records all unique.
SELECT
b.cid, c.cid,
(COUNT(*) * SUM(b.value * c.value) -
SUM(b.value) * SUM(c.value)) /
(SQRT(COUNT(*) * SUM(b.value * b.value) -
SUM(b.value) * SUM(b.value)) *
SQRT(COUNT(*) * SUM(c.value * c.value) -
SUM(c.value) * SUM(c.value))
) AS correl_ij
FROM
main_table1 a
JOIN
table1 AS b ON a.cid = b.cid
JOIN
table1 AS c ON b.cid < c.cid
AND b.duedate = c.duedate
AND b.currency = c.currency
GROUP BY
b.cid, c.cid
Please suggest how to optimize this query because it is running slow.
CREATE TABLE #table1(
id int identity,
cid int NOT NULL,
duedate date NOT NULL,
currency char(3) NOT NULL,
value float,
PRIMARY KEY(id,currency,cid,duedate)
);
CREATE TABLE #main_table1(
cid int NOT NULL PRIMARY KEY,
currency char(3)
);
--#main table contains 155000 cid records there is no duplicate values
insert into #main_table1
values(19498,'ABC'),(19500,'ABC'),(19534,'ABC')
INSERT INTO #table1(CID,DUEDATE,currency,value)
VALUES(19498,'2016-12-08','USD',-0.0279702098021799) ,
(19498,'2016-12-12','USD',0.0151285161000268),
(19498,'2016-12-15','USD',-0.00965080868337728),
(19498,'2016-12-19','USD',0.00808331709091531)
There are 3 million records in this table for diffrent dates and cid and most of the cid are present in #main_table1.
I am using a.cid < b.cid to remove duplicate relationship between a.cid and b.cid beause i am deriving corelation between each cid.
so 19498 -->>19500 corelation is calculated hence then i do not want 19500--> 19498 because it would be same but duplicate.
That PK is silly. Why would you include Iden in a composite PK let alone in the first position? Drop Iden unless you have to have it for some misguided reason.
PRIMARY KEY(cid, currency, duedate)
Or the natural key if different
If you're commonly joining or sorting on the cid column, you probably want a clustered index on that column or a composite beginning with that column.
If cid, duedate is unique then you can consider removing the id altogether.
If you want to retain id for some reason, make it PRIMARY KEY NONCLUSTERED, and specify a clustered index on cid, duedate.

combine tables into one table with TSQL

We have three tables, Agencies, Regions and Countries which we wish to combine into a new Countries table.
The old schema is
oldSchema.Agencies
AgentId PK,int
TaxName varchar(3)
TaxRate decimal(18,3)
oldSchema.Regions
RegionCode PK,tinyint
AgentId int
oldSchema.Countries
CountryCode PK,varchar(2)
CountryName varchar(50)
RegionCode tinyint
Our organisation no longer users agencies and regions so we want to combine the agency data into a newSchema.Countries table.
The new schema is
newSchema.Countries
CountryCode PK,varchar(2)
CountryName varchar(50)
TaxName varchar(3)
TaxRate decimal(18,3)
The following update query is incorrect in that it inserts identical data into every row
INSERT INTO newSchema.Countries (
CountryCode
,CountryName
,TaxName
,TaxRate
)
SELECT OldSchema.Countries.CountryCode
,OldSchema.Countries.CountryName
,OldSchema.Agencies.TaxName
,OldSchema.Agencies.TaxRate
FROM OldSchema.Agencies
INNER JOIN (
OldSchema.Regions INNER JOIN OldSchema.Countries ON OldSchema.Regions.RegionCode = OldSchema.Countries.RegionCode
) ON OldSchema.Agencies.AgentId = OldSchema.Regions.AgentId
WHERE NewSchema.Countries.CountryCode = OldSchema.Countries.CountryCode
How do we insert the TaxName and TaxRate for each agency into the new Countries table such that each country gets the correct tax as it was applied from the Agencies table?
I am not sure about the problem you are facing but I think you have multiple regions with the same AgentId resulting in repeated data for a specific country.
If that is your problem try the below query, otherwise provide more details for helping you.
INSERT INTO newSchema.Countries
(CountryCode,CountryName,TaxName,TaxRate)
select distinct
c.CountryCode,
c.CountryName,
a.TaxName,
a.TaxRate
from oldSchema.Agencies a
inner join oldSchema.Regions r on a.AgentId = r.AgentId
inner join oldSchema.Countries c on r.RegionCode = c.RegionCode
Also check this fiddle with sample data that I've created to demostration purposes.

TSQL to insert a set of rows and dependent rows

I have 2 tables:
Order (with a identity order id field)
OrderItems (with a foreign key to order id)
In a stored proc, I have a list of orders that I need to duplicate. Is there a good way to do this in a stored proc without a cursor?
Edit:
This is on SQL Server 2008.
A sample spec for the table might be:
CREATE TABLE Order (
OrderID INT IDENTITY(1,1),
CustomerName VARCHAR(100),
CONSTRAINT PK_Order PRIMARY KEY (OrderID)
)
CREATE TABLE OrderItem (
OrderID INT,
LineNumber INT,
Price money,
Notes VARCHAR(100),
CONSTRAINT PK_OrderItem PRIMARY KEY (OrderID, LineNumber),
CONSTRAINT FK_OrderItem_Order FOREIGN KEY (OrderID) REFERENCES Order(OrderID)
)
The stored proc is passed a customerName of 'fred', so its trying to clone all orders where CustomerName = 'fred'.
To give a more concrete example:
Fred happens to have 2 orders:
Order 1 has line numbers 1,2,3
Order 2 has line numbers 1,2,4,6.
If the next identity in the table was 123, then I would want to create:
Order 123 with lines 1,2,3
Order 124 with lines 1,2,4,6
On SQL Server 2008 you can use MERGE and the OUTPUT clause to get the mappings between the original and cloned id values from the insert into Orders then join onto that to clone the OrderItems.
DECLARE #IdMappings TABLE(
New_OrderId INT,
Old_OrderId INT)
;WITH SourceOrders AS
(
SELECT *
FROM Orders
WHERE CustomerName = 'fred'
)
MERGE Orders AS T
USING SourceOrders AS S
ON 0 = 1
WHEN NOT MATCHED THEN
INSERT (CustomerName )
VALUES (CustomerName )
OUTPUT inserted.OrderId,
S.OrderId INTO #IdMappings;
INSERT INTO OrderItems
SELECT New_OrderId,
LineNumber,
Price,
Notes
FROM OrderItems OI
JOIN #IdMappings IDM
ON IDM.Old_OrderId = OI.OrderID

Cannot create indexed view

CREATE TABLE orders
(
order_no INT NOT NULL PRIMARY KEY,
prod_id INT NOT NULL,
quantity INT
);
CREATE VIEW product_stats WITH SCHEMABINDING
AS
SELECT a.prod_id, a.product_name,
(SELECT COUNT(*) FROM dbo.orders WHERE prod_id = a.prod_id) AS total FROM dbo.products a;
CREATE UNIQUE CLUSTERED INDEX [IDX_Order_Details_X]
ON product_stats (prod_id, total)
It complains:
Column 'total' in view 'product_stats' cannot be used in an index or statistics or as a partition key because it does user or system data access.
DB is ms sql.
An indexed view cannot contain COUNT(*) or a subquery. See the "View Restrictions" section of this article.

Resources