MSSQL/TSQL Joining against a subquery - sql-server

I'm analyzing IIS log files from sharepoint and need to match each entry to it's SPWeb.
This SQL code works for a single value (#var1):
DECLARE #var1 varchar(128);
set #var1 = '/sites/Site1/Subsite1/Subsite2/Documents/marketing.docx';
select
TOP 1 *,
charindex(urlstub, #var1) as found
from
spwebs
where
charindex(urlstub, #var1) = 1
order by
urlstub DESC;
I'm looking for a way to get this to work for a tables worth of data instead of just the single variable #var1.
Example data
SPwebs:
/sites/Site1
/sites/Site1/Subsite1
/sites/Site1/Subsite1/Subsite2
/sites/Site2
etc..
IISlog: (this is the table I'd like to take the place of #var1 above)
/sites/Site1/Subsite1/Subsite2/Documents/marketing.docx
/sites/Site1/Subsite1/Subsite2/Documents/sales.docx
/sites/Site1/Subsite1/Subsite2/Documents/hr.docx
/sites/Site1/research/funding.docx
The expected outcome of the above would be:
Foreach record in the IISLog table:
Find the best/deepest matching record from the spwebs table:
|table | matchingSPweb |
|---------------------------------------------------------| --------------------------------|
| /sites/Site1/Subsite1/Subsite2/Documents/marketing.docx | /sites/Site1/Subsite1/Subsite2/ |
| /sites/Site1/Subsite1/Subsite2/Documents/sales.docx | /sites/Site1/Subsite1/Subsite2/ |
| /sites/Site1/Subsite1/Subsite2/Documents/hr.docx | /sites/Site1/Subsite1/Subsite2/ |
| /sites/Site1/research/funding.docx | /sites/Site1 |
I've tried
select iislogs2.*, spwebs.urlstub
from
iislogs2
inner join
(
select TOP 1 urlstub, csURIStem as found
from spwebs
where charindex(urlstub, iislogs2.csUriStem) = 1
order by urlstub DESC
) as x
on x.csuristem = iislogs2.csUriStem
but this just errors, it doesn't seem to understand csUriStem in the context of the subselect statement.

The easiest ways to fix your issue are either to change your current query to use a subquery in the select statement, e.g.:
SELECT iislogs2.*,
urlstub = (SELECT TOP 1 urlstub FROM spwebs WHERE CHARINDEX(urlstub, iislogs2.csUriStem) = 1 ORDER BY urlstub DESC)
from iislogs2;
... or change your current join to a cross apply, e.g.:
SELECT iislogs2.*, x.urlstub
from iislogs2
cross apply (SELECT TOP 1 urlstub FROM spwebs WHERE CHARINDEX(urlstub, iislogs2.csUriStem) = 1 ORDER BY urlstub DESC) AS x;
EDIT:
The query optimiser might do all sorts of weird sorts and spools, so one option to avoid that might be to use an explicit join with a CTE and then left join this back to your original table. For example:
;WITH CTE AS
(
SELECT i.csUriStem, s.urlstub, RN = ROW_NUMBER() OVER (PARTITION BY i.csUriStem ORDER BY s.urlstub DESC)
FROM iislogs2 AS i
JOIN spwebs AS s
ON i.csUriStem LIKE s.urlstub + '%'
)
SELECT i.*, c.urlstub
FROM iislogs2 AS i
LEFT JOIN CTE AS c
ON c.csUriStem = i.csUriStem
AND c.RN = 1;
Unfortunately, with strings and substrings, it's hard to get an execution plan that is really optimal for what you want to do, but I expect this sort of query will perform better with indexes than the other two.

Related

TSQL Combine multiple sub queries on same table

I've been trying to improve a SQL query which uses multiple sub queries over the same table but with different conditions and only retrieves the first result from each sub queries.
I will try to simplify the use-case :
I have a table Products like this:
Product_id
reference
field3
field 4
1
ref1
val1
val3
2
ref2
val2
val4
And another table History:
History_id
reference
utilcode
physicalcode
issue
media
datetime
1
ref1
'test'
'TST'
'0'
'&audio'
'a_date'
2
ref2
'phone'
'CALLER'
'1'
'&video'
'a_date'
3
ref2
'test'
'CALLER'
'2'
'&test'
'a_date'
History is a log table and therefore contains a lot of values.
Now I have a query like this
SELECT
p.reference,
p.field3, p.field4,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 0
ORDER BY a_date DESC) AS latest_date_issue_0,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 1
ORDER BY a_date DESC) AS latest_date_issue_1
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND utilcode = 'phone'
ORDER BY a_date DESC) AS latest_date_phone,
(SELECT TOP 1 media
FROM history h
WHERE h.reference = p.reference
AND utilcode = 'phone'
ORDER BY a_date DESC) AS latest_media,
-- and so on with many possible combinations
-- Note that there are more than this few fields on the tables I work on.
WHERE
p.field3 = 'valX',
p.field4 = 'valY'
FROM
products p
How could I merge every sub selects ? Or even a few that are alike to improve the performance ?
History being a very big table, selecting over it multiple times drastically slows down the query.
The main problem being that I only need the first value every time.
Thank you for your time and I hope to find a better way to deal with this issue!
I tried to use ROW_NUMBER() but I could not find a suitable way to use it.
I also tried to create a tmp table using WITH to group every possibility from history but it was worse.
EDIT : Execution plan https://www.brentozar.com/pastetheplan/?id=Sy1AKIsUs
You can convert your correlated subqueries (you call them "subselects") to independent subqueries, then JOIN them. That way each subquery will only need to run once. I'll show you how to do this for your first subquery.
Here's a subquery replacing your first subquery.
SELECT reference, MAX(a_date) a_date
FROM history
WHERE physicalcode = 'TST'
AND issue = 0
GROUP BY reference
This gives a virtual table containing the latest date for each reference number from the history table matching the criteria in your question. A multicolumn index on history (physicalcode, issue, reference, a_date) makes this fast.
Then you can join it to the main table something like this:
SELECT
p.reference,
p.field3, p.field4,
a.a_date a_date_issue_0
FROM products p
LEFT JOIN ( /*the subquery */
SELECT reference, MAX(a_date) a_date
FROM history
WHERE physicalcode = 'TST'
AND issue = 0
GROUP BY reference
) a ON p.reference=a.reference
These subqueries can also be defined as VIEWs or Common Table Expressions (CTEs). If you have many of them you'll probably find it easier to read and reason about your query by doing them that way.
Your last subquery is a little trickier to handle this way. I suggest you work with this answer and then maybe ask another question about that.
Thanks to #O.Jones I've been able to find a way to improve this query.
What I did to merge a few requests was to use a CTE like so :
From
SELECT
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 0
ORDER BY a_date DESC) AS latest_date_issue_0,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 1
ORDER BY a_date DESC) AS latest_date_issue_1
(SELECT top 1 a_date
FROM history h
WHERE h.reference = p.reference
AND h.physicalcode = 'TSTKO'
ORDER BY h.d_systeme DESC ) AS d_tst_ko,
(SELECT top 1 a_date
FROM history h
WHERE h.reference = p.reference
AND h.physicalcode = 'CALLERID'
ORDER BY h.d_systeme DESC ) AS d_wrong_number
FROM products p
To
WITH physicalcode_cte (reference, physicalcode, issue, a_date) as
(
SELECT reference, physicalcode, issue, max(a_date)
from historique
where codephysique in ('TST','TSTKO','CALLERID')
and a_date > dateadd(month, -4, getdate()) -- filter on date range to reduce number of rows
group by reference, physicalcode, issue
)
SELECT
date_issue_0.a_date,
date_issue_1.a_date,
tst_ko.a_date,
wrong_number.a_date
FROM products p
LEFT JOIN physicalcode_cte date_issue_0 on p.reference = date_issue_0.reference
AND date_issue_0.codephysique = 'TST'
AND date_issue_0.anomalie = 0
LEFT JOIN physicalcode_cte date_issue_1 on p.reference = date_issue_1.reference
AND date_issue_1.codephysique = 'TST'
AND date_issue_1.anomalie = 1
LEFT JOIN physicalcode_cte tst_ko on p.reference = tst_ko.reference
AND tst_ko.codephysique = 'TST'
LEFT JOIN physicalcode_cte wrong_number on p.reference = wrong_number.reference AND
AND wrong_number.codephysique = 'TST'
I've applied this idea for different scenarii and made 2 CTE.
I couldn't merge everything, sometime merging caused cost increase. But after several tests I've been able to go from 7100 total cost to 2100.
It is still a lot but 3 times less anyway. Takes 5 seconds instead of a timeout.
It's a query used for monthly reports so I don't need it to be super fast, I will keep it that way.
Thanks you!

Custom Sort Order in CTE

I need to get a custom sort order in a CTE but the error shows
"--The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified."
What's a better way to get the custom order in the CTE?
WITH
ctedivisiondesc
as
(
SELECT * FROM (
SELECT --TOP 1 --[APPID]
DH1.[ID_NUM]
--,[SEQ_NUM_2]
--,[CUR_DEGREE]
--,[NON_DEGREE_SEEKING]
,DH1.[DIV_CDE]
,DDF.DEGREE_DESC 'DivisionDesc'
--,[DEGR_CDE]
--,[PRT_DEGR_ON_TRANSC]
--,[ACAD_DEGR_CDE]
,[DTE_DEGR_CONFERRED]
--,MAX([DTE_DEGR_CONFERRED]) AS Date_degree_conferred
,ROW_NUMBER() OVER (
PARTITION BY [ID_NUM]
ORDER BY [DTE_DEGR_CONFERRED] DESC --Getting last degree
) AS [ROW NUMBER]
FROM [TmsePrd].[dbo].[DEGREE_HISTORY] As DH1
inner join [TmsePrd].[dbo].[DEGREE_DEFINITION] AS DDF
on DH1.[DEGR_CDE] = DDF.[DEGREE]
--ORDER BY
--DIV_CDE Level
--CE Continuing Education
--CT Certificate 1
--DC Doctor of Chiropractic 4
--GR Graduate 3
--PD Pending Division
--UG Undegraduate 2
--The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
ORDER BY CASE
WHEN DDF.DEGREE_DESC = 'Certificate' THEN 1
WHEN DDF.DEGREE_DESC = 'Undegraduate' THEN 2
WHEN DDF.DEGREE_DESC = 'Graduate' THEN 3
WHEN DDF.DEGREE_DESC = 'Doctor of Chiropractic' THEN 4
ELSE 5
END
) AS t
WHERE [ROW NUMBER] <= 1
)
SELECT * FROM ctedivisiondesc
You need to sort the outer query.
Sorting a subquery is not allowed because it is meaningless, consider this simple example:
WITH CTE AS
( SELECT ID
FROM (VALUES (1), (2)) AS t (ID)
ORDER BY ID DESC
)
SELECT *
FROM CTE
ORDER BY ID ASC;
The ordering on the outer query has overridden the ordering on the inner query rendering it a waste of time.
It is not just about explicit sorting of the outer query either, in more complex scenarios SQL Server may sort the subqueries any which way it wishes to enable merge joins or grouping etc. So the only way to guarantee the order or a result is to order the outer query as you wish.
Since you may not have all the data you need in the outer query, you may would probably need to create a further column inside the CTE to use for sorting. e.g.
WITH ctedivisiondesc AS
(
SELECT *
FROM ( SELECT DH1.ID_NUM,
DH1.DIV_CDE,
DDF.DEGREE_DESC AS DivisionDesc,
DTE_DEGR_CONFERRED,
ROW_NUMBER() OVER (PARTITION BY ID_NUM ORDER BY DTE_DEGR_CONFERRED DESC) AS [ROW NUMBER],
CASE
WHEN DDF.DEGREE_DESC = 'Certificate' THEN 1
WHEN DDF.DEGREE_DESC = 'Undegraduate' THEN 2
WHEN DDF.DEGREE_DESC = 'Graduate' THEN 3
WHEN DDF.DEGREE_DESC = 'Doctor of Chiropractic' THEN 4
ELSE 5
END AS SortOrder
FROM TmsePrd.dbo.DEGREE_HISTORY AS DH1
INNER JOIN TmsePrd.dbo.DEGREE_DEFINITION AS DDF
ON DH1.DEGR_CDE = DDF.DEGREE
) AS t
WHERE t.[ROW NUMBER] <= 1
)
SELECT ID_NUM,
DIV_CDE,
DivisionDesc,
DTE_DEGR_CONFERRED
FROM ctedivisiondesc
ORDER BY SortOrder;

Duplicated rows on getting Categories of a Publication

I have a table of Publications
Id | Title | Content ...
1 | 'Ex title 1' | 'example content 1'
2 | 'Ex title 2' | 'example content 2'
...
And a table of Categories
CategoryId | PublicationId
1 | 1
2 | 1
2 | 2
3 | 2
...
So a Publication could have one or many categories.
I am trying to get the first 10 publications and their categories on a single query, like that:
SELECT [Publication].Id, [Publication].Title, [Publication].Content, [PublicationCategory].CategoryId
FROM [Publication]
LEFT JOIN [PublicationCategory] ON [Publication].Id = [PublicationCategory].Id
ORDER BY [Publication].Id DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
But I am getting duplicated values because of diferents categories ids, which is the better way to get 10 publications and their categories and not getting duplicated rows (because of duplicated rows, i got duplicated publications)
You can first pick the TOP 10 Publications and then put a JOIN with the Category table like following query to get all the categories.
SELECT [Publication].*,[PublicationCategory].[categoryid]
FROM
(
SELECT TOP 10 [Publication].id,
[Publication].title,
[Publication].content
FROM Publications [Publication]
ORDER BY [Publication].Id DESC
) [Publication]
INNER JOIN Categories [PublicationCategory]
ON [Publication].id = [PublicationCategory].publicationid
DEMO
Use a CTE to number your publlication, and then JOIN onto your table PublicationCategory and filter on the value of ROW_NUMBER():
WITH RNs AS(
SELECT P.Id, P.Title, P.Content,
ROW_NUMBER() OVER (ORDER BY P.ID DESC) AS RN
FROM Publication P)
SELECT RNs.Id, Rns.Title, RNs.Content,
PC.CategoryId
FROM RNs
LEFT JOIN PublicationCategory PC ON RNs.Id = PC.Id
WHERE RNs.RN <= 10;
I Think the best answer is the #PSK's but What if a publication is not categorized? (weird case but if is not validated maybe could happen) so you can add a left join and always get at least the 10 publications, if a publication has no category you still will get it but with a NULL category
SELECT [Publication].*,[PublicationCategory].[categoryid]
FROM
(
SELECT TOP 10 [Publication].id,
[Publication].title,
[Publication].content
FROM Publications [Publication]
ORDER BY [Publication].Id DESC
) [Publication]
LEFT JOIN Categories [PublicationCategory]
ON [Publication].id = [PublicationCategory].publicationid

SQL Update query using select statement

I am trying to update a column in a table where the another column matches and selecting the top 1 for that column as the value to update.
Hard to explain, but this is what I wrote:
UPDATE CameraSpecifications AS a
SET a.Variant = (
SELECT TOP 1 GTIN
FROM CameraSpecifcations
WHERE b.ModelGroup = a.ModelGroup )
Hopefully that explains what I am trying to do.
I have a select statement that might also help:
SELECT
(
SELECT TOP 1 b.GTIN
FROM CameraSpecifications AS b
WHERE b.ModelGroup = a.ModelGroup
) AS Gtin,
a.ModelGroup,
COUNT(a.ModelGroup)
FROM CameraSpecifications AS a
GROUP BY a.ModelGroup
We can try doing an update join from CameraSpecifications to a CTE which finds the top GTIN value for each model group. Note carefully that I use an ORDER BY clause in ROW_NUMBER. It makes no sense to use TOP 1 without ORDER BY, so you should at some point update your question and mention TOP 1 with regard to a certain column.
WITH cte AS (
SELECT ModelGroup, GTIN,
ROW_NUMBER() OVER (PARTITION BY ModelGroup ORDER BY some_col) rn
FROM CameraSpecifications
)
UPDATE cs
SET Variant = t.GTIN
FROM CameraSpecifcations cs
INNER JOIN cte t
ON cs.ModelGroup = t.ModelGroup
WHERE
t.rn = 1;

Join the table valued function in the query

I have one table vwuser. I want join this table with the table valued function fnuserrank(userID). So I need to cross apply with table valued function:
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
For each userID it generates multiple records. I only want the last record for each empid that does not have a Rank of Term(inated). How can I do this?
Data:
HistoryID empid Rank MonitorDate
1 A1 E1 2012-8-9
2 A1 E2 2012-9-12
3 A1 Term 2012-10-13
4 A2 E3 2011-10-09
5 A2 TERM 2012-11-9
From this 2nd record and 4th record must be selected.
In SQL Server 2005+ you can use this Common Table Expression (CTE) to determine the latest record by MonitorDate that doesn't have a Rank of 'Term':
WITH EmployeeData AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY empId, ORDER BY MonitorDate DESC) AS RowNumber
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
WHERE Rank != 'Term'
)
SELECT *
FROM EmployeeData AS ed
WHERE ed.RowNumber = 1;
Note: The statement before this CTE will need to end in a semi-colon. Because of this, I have seen many people write them like ;WITH EmployeeData AS...
You'll have to play with this. Having trouble mocking your schema on sqlfiddle.
Select bar.*
from
(
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
where rank != 'TERM'
) foo
left join
(
SELECT *
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empId = bar.empId
and foo.MonitorDate > bar.MonitorDate
where bar.empid is null
I always need to test out left outers on dates being higher. The way it works is you do a left outer. Every row EXCEPT one per user has row(s) with a higher monitor date. That one row is the one you want. I usually use an example from my code, but i'm on the wrong laptop. to get it working you can select foo., bar. and look at the results and spot the row you want and make the condition correct.
You could also do this, which is easier to remember
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
) foo
join
(
select empid, max(monitordate) maxdate
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empid = bar.empid
and foo.monitordate = bar.maxdate
I usually prefer to use set based logic over aggregate functions, but whatever works. You can tweak it also by caching the results of your TVF join into a table variable.
EDIT:
http://www.sqlfiddle.com/#!3/613e4/17 - I mocked up your TVF here. Apparently sqlfiddle didn't like "go".
select foo.*, bar.*
from
(
SELECT f.*
FROM vwuser AS a
join fnuserrank f
on a.empid = f.empid
where rank != 'TERM'
) foo
left join
(
SELECT f1.empid [barempid], f1.monitordate [barmonitordate]
FROM vwuser AS b
join fnuserrank f1
on b.empid = f1.empid
where rank != 'TERM'
) bar
on foo.empId = bar.barempid
and foo.MonitorDate > bar.barmonitordate
where bar.barempid is null

Resources