Using data explorer to create queries:
SELECT P.id, creationdate,tags,owneruserid,answercount
--SELECT DISTINCT TAGNAME ,TAGID
FROM TAGS AS T
JOIN POSTTAGS AS PT
ON T.ID = PT.TAGID
JOIN POSTS AS P
ON PT.POSTID = P.ID
--WHERE CAST(P.TAGS AS VARCHAR) IN('JAVA')
WHERE PT.TAGID = 3143
How is it possible to add pagination in the query in order to take not only the first 50,000 results, but then run the query again to take the next remaining results?
There are a few ways to "page" through TSQL results; see:
How to return a page of results from SQL?
and
SQL performance: WHERE vs WHERE(ROW_NUMBER)
Here I will use the CTE method as:
It uses convenient row numbers to page through results, rather than trying to track less predictable factors such as creationdate.
It reportedly performs faster than the OFFSET method.
So, that question's query becomes this SEDE query:
-- StartRow: Starting row for paging
-- EndRow: Ending row for paging (Max 50K rows at a time)
WITH allData AS (
SELECT
ROW_NUMBER() OVER (ORDER BY P.creationdate) AS row
, P.id
, P.creationdate
, P.tags
, P.owneruserid
, P.answercount
FROM Posttags AS PT
JOIN Posts AS P ON PT.postid = P.id
WHERE PT.tagid = 3143 -- tag [scala]
)
SELECT *
FROM allData
WHERE row >= ##StartRow:INT?1##
AND row <= ##EndRow:INT?50000##
ORDER BY row
Related
I have a problem with the optimization of this query, I have 3 tables (Products = Catalogo.GTIN, Sales Header = TEDEF.Factura and Sales Detail = TEDEF.Farmacia).
The query tries to find the Mode of the column VPRODEXENIGV_FAR. This query without the ORDER BY executes in less than 3 seconds (the table of details has about 30 million rows).
But when I add the ORDER BY clause, the query now takes more than 30 minutes to run.
I want to know how can I optimize this query or the indexes that I need to optimize this.
SELECT *
FROM Catalogo.GTIN G
CROSS APPLY
(SELECT TOP 1
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM
TEDEF.Factura FAC2
INNER JOIN
TEDEF.Farmacia FAR ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE
G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY
YEAR(FAC2.VFECEMI_FAC),
MONTH(FAC2.VFECEMI_FAC),
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
ORDER BY
1 DESC --- <----- THE PROBLEM IS HERE
) GG
Ouch! You have a hugely expensive dependent subquery. It's expensive because SELECT TOP(n) ... ORDER BY col DESC does a whole lot of work to create a result set only to discard all but one row. And, it's a dependent subquery so it's run for every row of Catalogo.GTIN .
It looks like you want to count the resultset rows in the most recent month and year for each Catalogo.GTIN row. So, let's try to refactor your query to do that.
We'll start with a subquery to grab the month-start date of the latest Factura row for each catalog entry.
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
It's wise to test this and make sure it works correctly and performs tolerably well. If you test it in SSMS, you can use "Show Actual Execution Plan" and see if it recommends an extra index. This subquery need only be run once, rather than once per G.CODIGO row.
Then we'll use it in your larger query.
SELECT G.*,
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM Catalogo.GTIN G
JOIN (
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
) maxmes ON G.CODIGO = maxmes.CODIGO
JOIN TEDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
AND FAC2.VFECEMI_FAC >= maxmes.maxmes
GROUP BY maxmes.maxmes,
G.CODIGO,
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
Here is the tricky bit:
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes turns any date maxd into the first day of that month.
And, FAC2.VFECEMI_FAC >= maxmes.maxmes filters out rows before the first day of that month (for that CODIGO). It does so in a sargable way: a way that can exploit an index on FAC2.VFECEMI_FAC.
That is an alternative way to do TOP(1) ORDER BY d DESC. And faster.
It's all about sets of rows. Especially when using GROUP BY, it's performance-helpful to limit the number of rows in each set.
Obviously I cannot debug this.
Is me again, Finally i resolve the problem of the optimization, now the query delay is about 20 sec (with the sort instruction and with the count in a table over 30 million rows) i hope this way can help others or could be optimice more by the community.
I resolve the problem applying the sort but with the Row_Number instruction, in that way the server take my index for the sort instruction and make the magic:
WITH x
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GG.COD, GG.[AÑO], GG.[MES] ORDER BY GG.[ROW] DESC) [ID]
FROM Catalogo.GTIN G
CROSS APPLY
(
SELECT COUNT(FAR.VPRODEXENIGV_FAR) [ROW]
, YEAR(FAC2.VFECEMI_FAC) [AÑO]
, MONTH(FAC2.VFECEMI_FAC) [MES]
, FAR.VCODPROD_FAR_003 [COD]
, CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM TEDEF.Factura FAC2
INNER JOIN TEDEF.Farmacia FAR
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY YEAR(FAC2.VFECEMI_FAC)
, MONTH(FAC2.VFECEMI_FAC)
, FAR.VCODPROD_FAR_003
, FAR.VPRODEXENIGV_FAR
-- ORDER BY 1 DESC --- <---- this is the bad guy, please, don't do that xD
) GG
) SELECT *
FROM x WHERE ID = 1
In that way i can sort the Count instruction and calculate the Mode for the Column FAR.VPRODEXENIGV_FAR
I have a query in SQL Server with 6 JOINs and 1 LEFT JOIN to tables and views. It returns 16k records in about 1 second if the select clause is "SELECT *"
As soon as I specify even one column to display (SELECT ItemID, for example) the query slows down to about 70 seconds.
Query #1 (2s) - SELECT *:
SELECT *
FROM (SELECT LinkedToSet, LinkedToCopy, ',' + STRING_AGG(LocationID,',') + ',' Locs, count(1) OVER (PARTITION BY LinkedToSet) Copies
FROM Inventory.Locations WHERE LinkedToSet is not null AND (State & 4096)>0 GROUP BY LinkedToSet, LinkedToCopy) l
JOIN Bricklink_Set_Query bsq on l.LinkedToSet=bsq.Number
JOIN Bricklink.Set_Parts_Query bsp on l.LinkedToSet=bsp.SetNum AND bsp.Extra=0
JOIN Bricklink.Item_List i on bsp.ItemType=i.ItemType AND bsp.ItemID=i.Number
JOIN Bricklink.Category_List cat on i.Category_ID=cat.CatID
JOIN Bricklink.Color_List col on bsp.ColorID=col.ColorID
LEFT JOIN (SELECT LocationID, ItemType, ItemNum, ColorID, sum(QtyFound) as InvPcs
FROM Inventory.Item_History
GROUP BY LocationID, ItemType, ItemNum, ColorID) as h ON l.Locs like concat('%,',h.locationID,',%') AND h.ItemType=bsp.ItemType AND h.ItemNum=bsp.ItemID AND h.ColorID=bsp.ColorID
Actual Execution Plan: https://www.brentozar.com/pastetheplan/?id=SJD7Qemf_
Query #2 (81s) - SELECT a single column
SELECT bsp.ItemID
FROM (SELECT LinkedToSet, LinkedToCopy, ',' + STRING_AGG(LocationID,',') + ',' Locs, count(1) OVER (PARTITION BY LinkedToSet) Copies
FROM Inventory.Locations WHERE LinkedToSet is not null AND (State & 4096)>0 GROUP BY LinkedToSet, LinkedToCopy) l
JOIN Bricklink_Set_Query bsq on l.LinkedToSet=bsq.Number
JOIN Bricklink.Set_Parts_Query bsp on l.LinkedToSet=bsp.SetNum AND bsp.Extra=0
JOIN Bricklink.Item_List i on bsp.ItemType=i.ItemType AND bsp.ItemID=i.Number
JOIN Bricklink.Category_List cat on i.Category_ID=cat.CatID
JOIN Bricklink.Color_List col on bsp.ColorID=col.ColorID
LEFT JOIN (SELECT LocationID, ItemType, ItemNum, ColorID, sum(QtyFound) as InvPcs
FROM Inventory.Item_History
GROUP BY LocationID, ItemType, ItemNum, ColorID) as h ON l.Locs like concat('%,',h.locationID,',%') AND h.ItemType=bsp.ItemType AND h.ItemNum=bsp.ItemID AND h.ColorID=bsp.ColorID
Actual execution plan: https://www.brentozar.com/pastetheplan/?id=BJTr4x7Gu
The execution plans look totally different from each other and I'm not sure why. I've also tried wrapping the SELECT * and querying that, but some of these tables/views have the exact same field names, especially on the joins, so SQL Server throws an error:
This column 'foo' was specified multiple times.
How do I achieve the performance of SELECT * but limit which columns I display?
P.S. 2 Notes - 1) My desired select statement is obviously more complex than this and 2) Even using the full select statement, if I add a WHERE clause and restrict the query there, it runs in <1 second. If that plan would be useful I can post it as well.
I am trying to combine two SQL queries the first is
SELECT
EAC.Person.FirstName,
EAC.Person.Id,
EAC.Person.LastName,
EAC.Person.EmployeeId,
EAC.Person.IsDeleted,
Controller.Cards.SiteCode,
Controller.Cards.CardCode,
Controller.Cards.ActivationDate,
Controller.Cards.ExpirationDate,
Controller.Cards.Status,
EAC.[Group].Name
FROM
EAC.Person
INNER JOIN
Controller.Cards ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN
EAC.GroupPersonMap ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN
EAC.[Group] ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id
And the second one is
SELECT
IsActive, ActivationDateUTC, ExpirationDateUTC,
Sitecode + '-' + Cardcode AS Credential, 'Badge' AS Type,
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GetUTCDate()
THEN 'Pending'
WHEN ExpirationDAteUTC < GetUTCDate()
THEN 'Expired'
ELSE 'Active'
END AS Status
FROM
EAC.Credential
JOIN
EAC.WiegandCredential ON Credential.ID = WiegandCredential.CredentialId
WHERE
PersonID = '32'
Where I would like to run the second query for each user of the first query using EAC.Person.Id instead of the '32'.
I would like all the data to be returned in one Dataset so I can use it in Report Builder.
I have been fighting with this all day and am hoping one of you smart guys can give me a hand. Thanks in advance.
Based on your description in the comments, I understand that the connection between the two datasets is actually the PersonID field, which exists in both EAC.Credential and EAC.Person; however, in EAC.Credential, duplicate values exist for PersonID, and you want only the most recent one for each PersonID.
There are a few ways to do this, and it will depend on the number of rows returned, the indexes, etc., but I think maybe you're looking for something like this...?
SELECT
EAC.Person.FirstName
,EAC.Person.Id
,EAC.Person.LastName
,EAC.Person.EmployeeId
,EAC.Person.IsDeleted
,Controller.Cards.SiteCode
,Controller.Cards.CardCode
,Controller.Cards.ActivationDate
,Controller.Cards.ExpirationDate
,Controller.Cards.Status
,EAC.[Group].Name
,X.IsActive
,X.ActivationDateUTC
,X.ExpirationDateUTC
,X.Credential
,X.Type
,X.Status
FROM EAC.Person
INNER JOIN Controller.Cards
ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN EAC.GroupPersonMap
ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN EAC.[Group]
ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id
CROSS APPLY
(
SELECT TOP 1
IsActive
,ActivationDateUTC
,ExpirationDateUTC
,Sitecode + '-' + Cardcode AS Credential
,'Badge' AS Type
,'Status' =
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GETUTCDATE()
THEN 'Pending'
WHEN ExpirationDateUTC < GETUTCDATE()
THEN 'Expired'
ELSE 'Active'
END
FROM EAC.Credential
INNER JOIN EAC.WiegandCredential
ON EAC.Credential.ID = EAC.WiegandCredential.CredentialId
WHERE EAC.Credential.PersonID = EAC.Person.PersonID
ORDER BY EAC.Credential.ID DESC
) AS X
-- Optionally, you can also add conditions to return specific rows, i.e.:
-- WHERE EAC.Person.PersonID = 32
This option uses a CROSS APPLY, which means that every row of the first dataset will return additional values from the second dataset, based on the criteria that you described. In this CROSS APPLY, I'm joining the two datasets based on the fact that PersonID exists in both EAC.Person (in your first dataset) as well as in EAC.Credential. I then specify that I want only the TOP 1 row for each PersonID, with an ORDER BY specifying that we want the most recent (highest) value of ID for each PersonID.
The CROSS APPLY is aliased as "X", so in your original SELECT you now have several values prefixed with the X. alias, which just means that you're taking these fields from the second query and attaching them to your original results.
CROSS APPLY requires that a matching entry exists in both subsets of data, much like an INNER JOIN, so you'll want to check and make sure that the relevant values exist and are returned correctly.
I think this is pretty close to the direction you're trying to go. If not, let me know and I'll update the answer. Good luck!
Try like this;
select Query1.*, Query2.* from (
SELECT
EAC.Person.FirstName,
EAC.Person.Id as PersonId,
EAC.Person.LastName,
EAC.Person.EmployeeId,
EAC.Person.IsDeleted,
Controller.Cards.SiteCode,
Controller.Cards.CardCode,
Controller.Cards.ActivationDate,
Controller.Cards.ExpirationDate,
Controller.Cards.Status,
EAC.[Group].Name
FROM
EAC.Person
INNER JOIN
Controller.Cards ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN
EAC.GroupPersonMap ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN
EAC.[Group] ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id)
Query1 inner join (SELECT top 100
IsActive, ActivationDateUTC, ExpirationDateUTC,
Sitecode + '-' + Cardcode AS Credential, 'Badge' AS Type,
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GetUTCDate()
THEN 'Pending'
WHEN ExpirationDAteUTC < GetUTCDate()
THEN 'Expired'
ELSE 'Active'
END AS Status
FROM
EAC.Credential
JOIN
EAC.WiegandCredential ON Credential.ID = WiegandCredential.CredentialId
ORDER BY EAC.Credential.ID DESC) Query2 ON Query1.PersonId = Query2.PersonID
Just select two queries to join them like Query1 and Query2 by equaling PersonId data.
I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.
Having issues getting a dataset to return with one date per client in the query.
Requirements:
Must have the recent date of transaction per client list for user
Will need have the capability to run through EXEC
Current Query:
SELECT
c.client_uno
, c.client_code
, c.client_name
, c.open_date
into #AttyClnt
from hbm_client c
join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
select
ba.payr_client_uno as client_uno
, max(ba.tran_date) as tran_date
from blt_bill_amt ba
left outer join #AttyClnt ac on ba.payr_client_uno = ac.client_uno
where ba.tran_type IN ('RA', 'CR')
group by ba.payr_client_uno
Currently, this query will produce at least 1 row per client with a date, the problem is that there are some clients that will have between 2 and 10 dates associated with them bloating the return table to about 30,000 row instead of an idealistic 246 rows or less.
When i try doing max(tran_uno) to get the most recent transaction number, i get the same result, some have 1 value and others have multiple values.
The bigger picture has 4 other queries being performed doing other parts, i have only included the parts that pertain to the question.
Edit (2011-10-14 # 1:45PM):
select
ba.payr_client_uno as client_uno
, max(ba.row_uno) as row_uno
into #Bills
from blt_bill_amt ba
inner join hbm_matter m on ba.matter_uno = m.matter_uno
inner join hbm_client c on m.client_uno = c.client_uno
inner join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
and ba.tran_type in ('CR', 'RA')
group by ba.payr_client_uno
order by ba.payr_client_uno
--Obtain list of Transaction Date and Amount for the Transaction
select
b.client_uno
, ba.tran_date
, ba.tc_total_amt
from blt_bill_amt ba
inner join #Bills b on ba.row_uno = b.row_uno
Not quite sure what was going on but seems the Temp Tables were not acting right at all. Ideally i would have 246 rows of data, but with the previous query syntax it would produce from 400-5000 rows of data, obviously duplications on data.
I think you can use ranking to achieve what you want:
WITH ranked AS (
SELECT
client_uno = ba.payr_client_uno,
ba.tran_date,
be.tc_total_amt,
rnk = ROW_NUMBER() OVER (
PARTITION BY ba.payr_client_uno
ORDER BY ba.tran_uno DESC
)
FROM blt_bill_amt ba
INNER JOIN hbm_matter m ON ba.matter_uno = m.matter_uno
INNER JOIN hbm_client c ON m.client_uno = c.client_uno
INNER JOIN hbm_persnl p ON c.resp_empl_uno = p.empl_uno
WHERE p.login = #login
AND c.status_code = 'C'
AND ba.tran_type IN ('CR', 'RA')
)
SELECT
client_uno,
tran_date,
tc_total_amt
FROM ranked
WHERE rnk = 1
ORDER BY client_uno
Useful reading:
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
WITH common_table_expression (Transact-SQL)
Using Common Table Expressions