SQL Query Nested Select Syntax - sql-server

I am trying to create a query that provides me with the total number of Agents (AgentID) for each OfficeID. If someone could please guide me in the right direction, also if there are resources that give you a bunch of examples of different types of queries that would be useful for the future!
My issue right now is the syntax. I'm not sure where things need to go in order to get the desired output above.
Here's what I have as of now:
Tables OFFICE and AGENT:
CREATE TABLE OFFICE
(
OfficeID NVARCHAR(5) UNIQUE,
OfficeAddress NVARCHAR(18) NOT NULL,
PRIMARY KEY(OfficeID)
)
GO
CREATE TABLE AGENT
(
AgentID NVARCHAR(8) UNIQUE,
OfficeID NVARCHAR(5) NOT NULL,
AgentType NVARCHAR(9) NOT NULL,
AgentFName NVARCHAR(10) NOT NULL,
PRIMARY KEY (AgentId),
FOREIGN KEY (OfficeID) REFERENCES OFFICE
ON DELETE CASCADE
ON UPDATE CASCADE
)
GO
Query:
SELECT
OFFICE.OfficeID
FROM
OFFICE,
(SELECT COUNT(AgentID)
FROM AGENT, OFFICE
WHERE OFFICE.OfficeID = AGENT.OfficeID
GROUP BY AGENT.OfficeID)
ORDER BY
OFFICE.OfficeID

I'd do this with a JOIN and GROUP BY, no nesting required or wanted:
SELECT o.OfficeID, COUNT(a.AgentID) NumberOfAgents
FROM Office o
LEFT JOIN Agents a ON a.OfficeID = o.OfficeID
GROUP BY o.OfficeID

Something like this (your desired output appears to be missing):
SELECT O.OfficeID
, (
SELECT COUNT(*)
FROM AGENT A
WHERE A.OfficeID = O.OfficeID
)
FROM OFFICE O
ORDER BY O.OfficeID
Note the use of the table alias which is a recommended practice to keep your queries concise.

You need to be specific with what you want as per what I think no complex query is required in your case. For example you can get your desired output from the below query
select officeid, count(1) as NoofAgents
from Agents
group by officeid
SQL can give you your desired way in a lot way and you can choose them based on the optimized solution.

Related

Keyset Pagination - Filter By Search Term across Multiple Columns

I'm trying to move away from OFFSET/FETCH pagination to Keyset Pagination (also known as Seek Method). Since I'm just started, there are many questions I have in my mind but this is one of many where I try to get the pagination right along with Filter.
So I have 2 tables
aspnet_users
having columns
PK
UserId uniquidentifier
Fields
UserName NVARCHAR(256) NOT NULL,
AffiliateTag varchar(50) NULL
.....other fields
aspnet_membership
having columns
PK+FK
UserId uniquidentifier
Fields
Email NVARCHAR(256) NOT NULL
.....other fields
Indexes
Non Clustered Index on Table aspnet_users (UserName)
Non Clustered Index on Table aspnet_users (AffiliateTag)
Non Clustered Index on Table aspnet_membership(Email)
I have a page that will list the users (based on search term) with page size set to 20. And I want to search across multiple columns so instead of doing OR I find out having a separate query for each and then Union them will make the index use correctly.
so have the stored proc that will take search term and optionally UserName and UserId of last record for next page.
Create proc [dbo].[sp_searchuser]
#take int,
#searchTerm nvarchar(max) NULL,
#lastUserName nvarchar(256)=NULL,
#lastUserId nvarchar(256)=NULL
AS
IF(#lastUserName IS NOT NULL AND #lastUserId IS NOT NULL)
Begin
select top (#take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like #searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), #searchTerm)
) as u1
where u1.UserName > #lastUserName
OR (u1.UserName=#lastUserName And u1.UserId > convert(uniqueidentifier, #lastUserId))
order by u1.UserName
End
Else
Begin
select top (#take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like #searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), #searchTerm)
) as u1
order by u1.UserName
End
Now to get the result for first page with search term mua
exec [sp_searchuser] 20, 'mua%'
it uses both indexes created one for UserName column and another for AffiliateTag column which is good
But the problem is I find the inner union queries return all the matching rows
like in this case, the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 465
so in total inner queries return 469 matching rows
and then outer query take out 20 for final result reset. So really reading more data than needed.
And when go to next page
exec [sp_searchuser] 20, 'mua%', 'lastUserName', 'lastUserId'
the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 445
in total inner queries return 449 matching rows
so either with or without pagination, it reads more data than needed.
My expectation is to somehow limit the inner queries so it does not return all matching rows.
You might be interested in the Logical Processing Order, which determines when the objects defined in one step are made available to the clauses in subsequent steps. The Logical Processing Order steps are:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Of course, as noted the docs:
The actual physical execution of the statement is determined by the
query processor and the order may vary from this list.
meaning that sometimes some statements can start before previous complete.
In your case, you query looks like:
some data extraction
sort by user_name
get TOP records
There is no way to reduce the rows in the data extraction part as to have a deterministic result (we actually may need to order by user_name, user_id to have such) we need to get all matching rows, sort them and then get the desired rows.
For example, image the first query returning 20 names starting with 'Z'. And the second query to returned only one name starting with 'A'. If you stop somehow the execution and skip the second query, you will get wrong results - 20 names starting with 'Z' instead one starting with 'A' and 19 with 'Z'.
In such cases, I prefer to use dynamic T-SQL statements in order to get better execution times and reduce the code length. You are saying:
And I want to search across multiple columns so instead of doing OR I
find out having a separate query for each and then Union them will
make the index use correctly.
When you are using UNION you are performing double reads to your tables. In your cases, you are reading the aspnet_Membership table twice and the aspnet_Users twice (yes, here you are using two different indexes but I believe they are not covering and you end up performing look ups to extract the users name and email.
I guess you have started with covering indexed like in the example below:
DROP TABLE IF EXISTS [dbo].[StackOverflow];
CREATE TABLE [dbo].[StackOverflow]
(
[UserID] INT PRIMARY KEY
,[UserName] NVARCHAR(128)
,[AffiliateTag] NVARCHAR(128)
,[UserEmail] NVARCHAR(128)
,[a] INT
,[b] INT
,[c] INT
,[z] INT
);
CREATE INDEX IX_StackOverflow_UserID_UserName_AffiliateTag_I_UserEmail ON [dbo].[StackOverflow]
(
[UserID]
,[UserName]
,[AffiliateTag]
)
INCLUDE ([UserEmail]);
GO
INSERT INTO [dbo].[StackOverflow] ([UserID], [UserName], [AffiliateTag], [UserEmail])
SELECT TOP (1000000) ROW_NUMBER() OVER(ORDER BY t1.number)
,CONCAT('UserName',ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('AffiliateTag', ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('UserEmail', ROW_NUMBER() OVER(ORDER BY t1.number))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
GO
So, for the following query:
SELECT TOP 20 [UserID]
,[UserName]
,[AffiliateTag]
,[UserEmail]
FROM [dbo].[StackOverflow]
WHERE [UserName] LIKE 'UserName200%'
OR [AffiliateTag] LIKE 'UserName200%'
ORDER BY [UserName];
GO
The issue here is we are reading all the rows even we are using the index.
What's good is that the index is covering and we are not performing look ups. Depending on the search criteria it may perform better than your approach.
If the performance is bad, we can use a trigger to UNPIVOT the original data and record in a separate table. It may look like this (it will be better to use attribute_id rather than the text like me):
DROP TABLE IF EXISTS [dbo].[StackOverflowAttributes];
CREATE TABLE [dbo].[StackOverflowAttributes]
(
[UserID] INT
,[AttributeName] NVARCHAR(128)
,[AttributeValue] NVARCHAR(128)
,PRIMARY KEY([UserID], [AttributeName], [AttributeValue])
);
GO
CREATE INDEX IX_StackOverflowAttributes_AttributeValue ON [dbo].[StackOverflowAttributes]
(
[AttributeValue]
)
INSERT INTO [dbo].[StackOverflowAttributes] ([UserID], [AttributeName], [AttributeValue])
SELECT [UserID]
,'Name'
,[UserName]
FROM [dbo].[StackOverflow]
UNION
SELECT [UserID]
,'AffiliateTag'
,[AffiliateTag]
FROM [dbo].[StackOverflow];
and the query before will looks like:
SELECT TOP 20 U.[UserID]
,U.[UserName]
,U.[AffiliateTag]
,U.[UserEmail]
FROM [dbo].[StackOverflowAttributes] A
INNER JOIN [dbo].[StackOverflow] U
ON A.[UserID] = U.[UserID]
WHERE A.[AttributeValue] LIKE 'UserName200%'
ORDER BY U.[UserName];
Now, we are reading only a part of the the index rows and after that performing a lookup.
In order to compare performance it will be better to use:
SET STATISTICS IO, TIME ON;
as it will give you how pages are read from the indexes. The result can be visualized here.

SQL Server & SSMS 2012 - Move a value from one column to a new one to ensure only one row

This is a problem that has troubled several times in the past an I have always wondered if a solution is possible.
I have a query using several tables one of the values is mobile phone number.
I have name, addresss etc.... I also have income information in the table which is used for a summary in Excel.
Where the problem occurs is when a contact has more than one mobile number, as you know this will create extra rows with the majority of the data being duplicate including the income.
Question: is it possible for the query to identify whether the contact has more than one number and if so create a new column with the 2nd mobile number.
Effectively returning the contacts information to one row and creating new columns.
My SQL is intermediate and I cannot think of a solution so thought I would ask.
Many thanks
I am pretty sure that it isn't the best possible solution, since we don't have information on how many records do you have in your dataset and I didn't have enough time, so just an idea how you can solve your original problem with two different numbers for one same customer.
declare #t table (id int
,firstName varchar(20)
,lastName varchar(20)
,phoneNumber varchar(20)
,income money)
insert into #t values
(1,'John','Doe','1234567',50)
,(1,'John','Doe','6789856',50)
,(2,'Mike','Smith','5687456',150)
,(3,'Stela','Hodhson','3334445',500)
,(4,'Nick','Slotter','5556667',550)
,(4,'Nick','Slotter','8889991',550)
,(5,'Abraham','Lincoln','4578912',52)
,(6,'Ronald','Regan','6987456',587)
,(7,'Thomas','Jefferson','8745612',300);
with a as(
select id
,max(phoneNumber) maxPhone
from #t group by id
),
b as(
select id
,min(phoneNumber) minPhone
from #t group by id
)
SELECT distinct t.id
,t.firstName
,t.lastName
,t.income
,a.maxPhone as phoneNumber1
,case when b.minPhone = a.maxPhone then ''
else b.minphone end as phoneNumber2
from #t t
inner join a a on a.id = t.id
inner join b b on b.id = t.id

The Recursive CTE to query all ancestors of a Parent-Child table is slow

We have a self-referenced table like this
CREATE TABLE Categories(
Id int IDENTITY(1,1) NOT NULL,
Title nvarchar(200) NOT NULL,
ParentId int NULL,
CONSTRAINT PK_Structures PRIMARY KEY CLUSTERED
(
Id ASC
)
CREATE NONCLUSTERED INDEX IX_Structures_ParentId ON Categories
(
ParentId ASC
)
And a recursive cte to get all ancestors:
Create View Ancestors
as
with A(Id, ParentId) as
(
select Id, Id from Categories
union all
select e.Id, p.ParentId from Categories e
join A p on e.ParentId = p.Id
)
select * from A
Now we query all ancestors of a given Category like:
select * from Ancestors where Id = 1234
It takes 11 seconds for a table just containing 100000 categories, and the execution plan is. The query returns 5 rows for the given Id
I know I can greatly improve the performance by using hierarchyid, also I know that sometimes using while can be more performant, but in a simple case like this, I expect to see a much better performance.
Also, please note that I already have an index on ParentId
(The picture shows the table structure which is the actual name of Category table mentioned in the question.
Is there a tuning to greatly improve this performance?
Well. It turns out the reason for the slowness, and the fix are far more interesting than anticipated.
Sql server optimizes the queries based on their definition and not by what semantic meaning they might have. The view in question starts with all Categories and adds new rows by finding elements from the CTE itself and their children. Now the way to find all rows in which some row has appeared as a child, you need to calculate the whole query and then filter it out. Only the human reader understands that the query calculates all the descendants of any Category, which of course also has all Ancestors of any Category. Then you know you can start from bottom and find parents recursively. This is not apparent from the query definition, only from its semantic meaning.
Rewriting the view as follows will make it fast:
Create View Ancestors
as
with A(Id, ParentId) as
(
select Id, Id from Categories
union all
select p.Id, e.ParentId from Categories e
join A p on e.Id = p.ParentId
)
select * from A
This view creates almost the same result as the view in question. The only difference is that it also shows null as an ancestor for all Categories, which makes no difference for our usage.
This view starts to build the hierarchy from bottom and goes up, which is compatible with the way we intend to query it.
How does the execution plan look like if you put the filter condition inside the CTE?
with A(Id, ParentId) as
(
select Id, Id
from Categories
WHERE Categories.ID = 1234
union all
select e.Id, p.ParentId
from Categories e
join A p on e.ParentId = p.Id
)
select *
from A;

Selecting count without Group By

I have following table in SQL Server 2005. One order can have multiple containers. A container can be either Plastic or wood (New types may come in future).
I need to list the following columns -
OrderID, ContainerType, ContainerCOUNT and ContainerID.
Since I need to list the ContainerID also, the following group by approach won’t work.
DECLARE #OrderCoarntainers TABLE (OrderID INT, ContainerID INT, ContainerType VARCHAR(10))
INSERT INTO #OrderCoarntainers VALUES (1,101,'Plastic')
INSERT INTO #OrderCoarntainers VALUES (1,102,'Wood')
INSERT INTO #OrderCoarntainers VALUES (1,103,'Wood')
INSERT INTO #OrderCoarntainers VALUES (2,104,'Plastic')
SELECT OrderID,ContainerType,COUNT(DISTINCT ContainerID) AS ContainerCOUNT
FROM #OrderCoarntainers
GROUP BY OrderID,ContainerType
What is the best way to achive this?
Note: Upgrading SQL Server version is not an option for me.
Expected Result
You should be able to use a windowed function
SELECT OrderID,
ContainerType,
COUNT(ContainerID) OVER (PARTITION BY OrderID, ContainerType) AS ContainerCOUNT,
ContainerID
FROM #OrderCoarntainers
I really don't know SQL Server dialect of SQL that well, but I can suggest something that is pretty basic and may work. It relies on a join, which is not optimal for performance but will get the job done if the table is not huge or performance is not critical. Really the problem here is the table design is pretty bad for the data you are managing, as this should not all be in one table. But anyway:
SELECT o1.OrderID, o1.ContainerType, count(o2.ContainerID) AS ContainerCOUNT, o1.ContainerID
FROM #OrderCoarntainers o1 JOIN #OrderCoarntainers o2
ON o1.OrderID = o2.orderID AND o1.ContainerType = o2.ContainerType
GROUP BY o1.OrderID

Is recursion good in SQL Server?

I have a table in SQL server that has the normal tree structure of Item_ID, Item_ParentID.
Suppose I want to iterate and get all CHILDREN of a particular Item_ID (at any level).
Recursion seems an intuitive candidate for this problem and I can write an SQL Server function to do this.
Will this affect performance if my table has many many records?
How do I avoid recursion and simply query the table? Please any suggestions?
With the new MS SQL 2005 you could use the WITHkeyword
Check out this question and particularly this answer.
With Oracle you could use CONNECT BY keyword to generate hierarchical queries (syntax).
AFAIK with MySQL you'll have to use the recursion.
Alternatively you could always build a cache table for your records parent->child relationships
As a general answer, it is possible to do some pretty sophisticated stuff in SQL Server that normally needs recursion, simply by using an iterative algorithm. I managed to do an XHTML parser in Transact SQL that worked surprisingly well. The the code prettifier I wrote was done in a stored procedure. It aint elegant, it is rather like watching buffalo doing Ballet. but it works .
Are you using SQL 2005?
If so you can use Common Table Expressions for this. Something along these lines:
;
with CTE (Some, Columns, ItemId, ParentId) as
(
select Some, Columns, ItemId, ParentId
from myTable
where ItemId = #itemID
union all
select a.Some, a.Columns, a.ItemId, a.ParentId
from myTable as a
inner join CTE as b on a.ParentId = b.ItemId
where a.ItemId <> b.ItemId
)
select * from CTE
The problem you will face with recursion and performance is how many times it will have to recurse to return the results. Each recursive call is another separate call that will have to be joined into the total results.
In SQL 2k5 you can use a common table expression to handle this recursion:
WITH Managers AS
(
--initialization
SELECT EmployeeID, LastName, ReportsTo
FROM Employees
WHERE ReportsTo IS NULL
UNION ALL
--recursive execution
SELECT e.employeeID,e.LastName, e.ReportsTo
FROM Employees e INNER JOIN Managers m
ON e.ReportsTo = m.employeeID
)
SELECT * FROM Managers
or another solution is to flatten the hierarchy into a another table
Employee_Managers
ManagerId (PK, FK to Employee table)
EmployeeId (PK, FK to Employee table)
All the parent child relation ships would be stored in this table, so if Manager 1 manages Manager 2 manages employee 3, the table would look like:
ManagerId EmployeeId
1 2
1 3
2 1
This allows the hierarchy to be easily queried:
select * from employee_managers em
inner join employee e on e.employeeid = em.employeeid and em.managerid = 42
Which would return all employees that have manager 42. The upside will be greater performance, but downside is going to be maintaining the hierarchy
Joe Celko has a book (<- link to Amazon) specifically on tree structures in SQL databases. While you would need recursion for your model and there would definitely be a potential for performance issues there, there are alternative ways to model a tree structure depending on what your specific problem involves which could avoid recursion and give better performance.
Perhaps some more detail is in order.
If you have a master-detail relationship as you describe, then won't a simple JOIN get what you need?
As in:
SELECT
SOME_FIELDS
FROM
MASTER_TABLE MT
,CHILD_TABLE CT
WHERE CT.PARENT_ID = MT.ITEM_ID
You shouldn't need recursion for children - you're only looking at the level directly below (i.e. select * from T where ParentId = #parent) - you only need recursion for all descendants.
In SQL2005 you can get the descendants with:
with AllDescendants (ItemId, ItemText) as (
select t.ItemId, t.ItemText
from [TableName] t
where t.ItemId = #ancestorId
union
select sub.ItemId, sub.ItemText
from [TableName] sub
inner join [TableName] tree
on tree.ItemId = sub.ParentItemId
)
You don't need recursion at all....
Note, I changed columns to ItemID and ItemParentID for ease of typing...
DECLARE #intLevel INT
SET #intLevel = 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECT ItemID, ItemParentID, #intLevel
WHERE ItemParentID IS NULL
WHILE #intLevel < #TargetLevel
BEGIN
SET #intLevel = #intLevel + 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECt ItemID, ItemParentID, #intLevel
WHERE ItemParentID IN (SELECT ItemID FROM TempTable WHERE Level = #intLevel-1)
-- If no rows are inserted then there are no children
IF ##ROWCOUNT = 0
BREAK
END
SELECt ItemID FROM TempTable WHERE Level = #TargetLevel

Resources