Can someone help me make this SQL query more efficient?

Can someone help me make this SQL query more efficient? - sql-server

SELECT
datepart(qq, o.created_date),
count(DISTINCT o.order_id),
sum(o.order_margin)
FROM
orders o
WHERE
o.account_id IN (SELECT e.account_id
FROM emailsegment e
WHERE e.segment = 'H')
AND o.created_date >= '1/1/2016'
AND o.order_status = 'Shipped'
GROUP BY
datepart(qq, o.created_date)
ORDER BY
datepart(qq, o.created_date)
This is taking forever to run, any ideas?

Try this:
SELECT
datepart(qq, o.created_date),
count(DISTINCT o.order_id),
sum(o.order_margin)
FROM
orders o
INNER JOIN
emailsegment e ON e.account_id = o.account_id AND e.segment = 'H'
WHERE
o.created_date >= '2016-01-01'
AND o.order_status = 'Shipped'
GROUP BY
datepart(qq, o.created_date)
ORDER BY
datepart(qq, o.created_date)

Just several wild guesses:
How many records returns following query and how long it is taking to run?
SELECT e.account_id FROM emailsegment e WHERE e.segment = 'H';
It might be beneficial to create an index on account_id with filter on segment column.
CREATE INDEX ix_account_id_emailsegment ON emailsegment(account_id)
WHERE segment = 'H'
Or
CREATE INDEX ix_segment_emailsegment ON emailsegment(segment)
INCLUDE (account_id)
Depending which number of filtered records is smaller created_date >= '1/1/2016' or o.account_id IN ( or o.order_status = 'Shipped' that column has to be the first column within an index. Would say it will be created_date then you have to create an index like:
CREATE INDEX ix_created_date_orders
on orders(created_date, account_id, order_id)
INCLUDE (order_margin)
WHERE order_status = 'Shipped';
If your column order_id is a clustered index then you do not have include it into the index.
You can try to add word DISTINCT into your sub-query. Not sure if that will help.

Related

Optimise query with count and order by function

I have a problem with the optimization of this query, I have 3 tables (Products = Catalogo.GTIN, Sales Header = TEDEF.Factura and Sales Detail = TEDEF.Farmacia).
The query tries to find the Mode of the column VPRODEXENIGV_FAR. This query without the ORDER BY executes in less than 3 seconds (the table of details has about 30 million rows).
But when I add the ORDER BY clause, the query now takes more than 30 minutes to run.
I want to know how can I optimize this query or the indexes that I need to optimize this.
SELECT *
FROM Catalogo.GTIN G
CROSS APPLY
(SELECT TOP 1
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM
TEDEF.Factura FAC2
INNER JOIN
TEDEF.Farmacia FAR ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE
G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY
YEAR(FAC2.VFECEMI_FAC),
MONTH(FAC2.VFECEMI_FAC),
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
ORDER BY
1 DESC --- <----- THE PROBLEM IS HERE
) GG

Ouch! You have a hugely expensive dependent subquery. It's expensive because SELECT TOP(n) ... ORDER BY col DESC does a whole lot of work to create a result set only to discard all but one row. And, it's a dependent subquery so it's run for every row of Catalogo.GTIN .
It looks like you want to count the resultset rows in the most recent month and year for each Catalogo.GTIN row. So, let's try to refactor your query to do that.
We'll start with a subquery to grab the month-start date of the latest Factura row for each catalog entry.
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
It's wise to test this and make sure it works correctly and performs tolerably well. If you test it in SSMS, you can use "Show Actual Execution Plan" and see if it recommends an extra index. This subquery need only be run once, rather than once per G.CODIGO row.
Then we'll use it in your larger query.
SELECT G.*,
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM Catalogo.GTIN G
JOIN (
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
) maxmes ON G.CODIGO = maxmes.CODIGO
JOIN TEDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
AND FAC2.VFECEMI_FAC >= maxmes.maxmes
GROUP BY maxmes.maxmes,
G.CODIGO,
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
Here is the tricky bit:
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes turns any date maxd into the first day of that month.
And, FAC2.VFECEMI_FAC >= maxmes.maxmes filters out rows before the first day of that month (for that CODIGO). It does so in a sargable way: a way that can exploit an index on FAC2.VFECEMI_FAC.
That is an alternative way to do TOP(1) ORDER BY d DESC. And faster.
It's all about sets of rows. Especially when using GROUP BY, it's performance-helpful to limit the number of rows in each set.
Obviously I cannot debug this.

Is me again, Finally i resolve the problem of the optimization, now the query delay is about 20 sec (with the sort instruction and with the count in a table over 30 million rows) i hope this way can help others or could be optimice more by the community.
I resolve the problem applying the sort but with the Row_Number instruction, in that way the server take my index for the sort instruction and make the magic:
WITH x
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GG.COD, GG.[AÑO], GG.[MES] ORDER BY GG.[ROW] DESC) [ID]
FROM Catalogo.GTIN G
CROSS APPLY
(
SELECT COUNT(FAR.VPRODEXENIGV_FAR) [ROW]
, YEAR(FAC2.VFECEMI_FAC) [AÑO]
, MONTH(FAC2.VFECEMI_FAC) [MES]
, FAR.VCODPROD_FAR_003 [COD]
, CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM TEDEF.Factura FAC2
INNER JOIN TEDEF.Farmacia FAR
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY YEAR(FAC2.VFECEMI_FAC)
, MONTH(FAC2.VFECEMI_FAC)
, FAR.VCODPROD_FAR_003
, FAR.VPRODEXENIGV_FAR
-- ORDER BY 1 DESC --- <---- this is the bad guy, please, don't do that xD
) GG
) SELECT *
FROM x WHERE ID = 1
In that way i can sort the Count instruction and calculate the Mode for the Column FAR.VPRODEXENIGV_FAR

Display of online users on the system

I don't know exactly where I'm wrong, but I need a list of all the workers who are currently at work (for the current day), this is my sql query:
SELECT
zp.ID,
zp.USER_ID,
zp.Arrive,
zp.Deppart,
zp.DATUM
FROM time_recording as zp
INNER JOIN personal AS a on zp.USER_ID, = zp.USER_ID,
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
ORDER BY zp.ID DESC
this is what the data looks like with my query:
For me the question is, how can I correct my query so that I only get the last Arrive time for the current day for each user?
In this case to get only these values:

Try this below script using ROW_NUMBER as below-
SELECT * FROM
(
SELECT zp.ID, zp.USER_ID, zp.Arrive, zp.Deppart, zp.DATUM,
ROW_NMBER() OVER(PARTITION BY zp.User_id ORDER BY zp.Arrive DESC) RN
FROM time_recording as zp
INNER JOIN personal AS a
on zp.USER_ID = zp.USER_ID
-- You need to adjust above join relation as both goes to same table
-- In addition, as you are selecting nothing from table personal, you can drop the total JOIN part
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
)A
WHERE RN =1

you can try this:
SELECT DISTINCT
USER_ID,
LAR.LastArrive
FROM time_recording as tr
CROSS APPLY (
SELECT
MAX(Arrive) as LastArrive
FROM time_recording as ta
WHERE
tr.USER_ID = ta.USER_ID AND
ta.Arrive IS NOT NULL
) as LAR

Getting more than two counts in one query

scenario: I have table A and Table B. Both have the primary key of Rep_time_frame. We replicate data by terms. however, that process at times was repeated.
task: Match the source counts with the destination table counts.
problem: I can match the rows of the source table but not the destination table because the replication for some terms happened twice. So, my counts are off.
SELECT d.REPT_TIME_FRAME,
c.SOURCE_TOTAL_RECORDS,
COUNT(*) TOTAL_RECORDS,
CASE
WHEN COUNT(*) > c.SOURCE_TOTAL_RECORDS THEN c.SOURCE_TOTAL_RECORDS
ELSE COUNT(*)
END FINAL_TOTAL_RECORDS,
***CASE
WHEN EXISTS (SELECT REPT_TIME_FRAME
FROM TableA_Counts
WHERE REPT_TIME_FRAME IN (201705, 201708, 201801, 201706, 201710, 201803)
GROUP BY REPT_TIME_FRAME
HAVING COUNT(*) > 1) THEN 'UPLOADED MORE THAN ONCE'
ELSE'UPLOADED ONCE'
END NUMBER_OF_REPLICATIONS***
FROM TableA d
INNER JOIN TableA_Counts c ON (d.REPT_TIME_FRAME = c.REPT_TIME_FRAME)
WHERE d.REPT_TIME_FRAME IN (201705, 201708, 201801, 201706, 201710, 201803)
GROUP BY d.REPT_TIME_FRAME, c.SOURCE_TOTAL_RECORDS```
please note that the bolded area, when ran returns for the newly made column Number_of_replications "Uploaded More than once" on all counts. For the purposes of this example. Here are the facts.
201705 was replicated 2X
201801 once and 201708 once.
expected result. I want to have 201705 'Uploaded more than once', and the rest 201801 and 201708 'Uploaded Once'

Does this help?
http://sqlfiddle.com/#!18/8692a/1
I always try and get the counts then do a case statement on the final figures. I've based that SqlFiddle on the notion that TableA_Counts is the table with more than one entry for 201705. Is that correct?
CREATE TABLE TableA (REPT_TIME_FRAME int)
INSERT INTO TableA(REPT_TIME_FRAME)VALUES
(201705),
(201708),
(201801),
(201706),
(201710),
(201803 )
CREATE TABLE TableA_Counts (REPT_TIME_FRAME int)
INSERT INTO TableA_Counts(REPT_TIME_FRAME)VALUES
(201705),
(201708),
(201705),
(201801),
(201706),
(201710),
(201803)
SELECT
d.REPT_TIME_FRAME,count(*) as TOTAL_RECORDS,SOURCE_TOTAL_RECORDS,
CASE WHEN SOURCE_TOTAL_RECORDS > 1 then 'more than' else 'just one' end as NUMBER_OF_REPLICATIONS
FROM
TableA d
LEFT JOIN
(SELECT REPT_TIME_FRAME ,count(*) as SOURCE_TOTAL_RECORDS from TableA_Counts group by REPT_TIME_FRAME) c ON (d.REPT_TIME_FRAME = c.REPT_TIME_FRAME)
group by d.REPT_TIME_FRAME,SOURCE_TOTAL_RECORDS

SELECT d.REPT_TIME_FRAME,
c.SOURCE_TOTAL_RECORDS,
COUNT(*) TOTAL_RECORDS,
CASE
WHEN COUNT(*) > c.SOURCE_TOTAL_RECORDS THEN c.SOURCE_TOTAL_RECORDS
ELSE COUNT(*)
END FINAL_TOTAL_RECORDS,
a.count AS NUM_OF_REPLICATIONS
FROM TableA d
INNER JOIN TableACounts c ON (d.REPT_TIME_FRAME = c.REPT_TIME_FRAME)
INNER JOIN (SELECT REPT_TIME_FRAME ,count(REPT_TIME_FRAME) as count
FROM TableAcounts
WHERE REPT_TIME_FRAME IN (201705, 201708, 201801, 201706, 201710, 201803)
GROUP BY REPT_TIME_FRAME) AS a ON (a.REPT_TIME_FRAME = d.REPT_TIME_FRAME)
WHERE d.REPT_TIME_FRAME IN (201705, 201708, 201801, 201706, 201710, 201803)
GROUP BY d.REPT_TIME_FRAME, c.SOURCE_TOTAL_RECORDS, a.count

SQL Query Get Last record Group by multiple fields

Hi I have a table with following fields:
ALERTID POLY_CODE ALERT_DATETIME ALERT_TYPE
I need to query above table for records in the last 24 hour.
Then group by POLY_CODE and ALERT_TYPE and get the latest Alert_Level value ordered by ALERT_DATETIME
I can get up to this, but I need the AlertID of the resulting records.
Any suggestions what would be an efficient way of getting this ?
I have created an SQL in SQL Server. See below
SELECT POLY_CODE, ALERT_TYPE, X.ALERT_LEVEL AS LAST_ALERT_LEVEL
FROM
(SELECT * FROM TableA where ALERT_DATETIME >= GETDATE() -1) T1
OUTER APPLY (SELECT TOP 1 [ALERT_LEVEL]
FROM (SELECT * FROM TableA where ALERT_DATETIME >= GETDATE() -1) T2
WHERE T2.POLY_CODE = T1.POLY_CODE AND
T2.ALERT_TYPE = T1.ALERT_TYPE ORDER BY T2.[ALERT_DATETIME] DESC) X
GROUP BY POLY_CODE, ALERT_TYPE, X.[ALERT_LEVEL]
POLY_CODE ALERT_TYPE ALERT_LEVEL
04575 Elec 2
04737 Gas 3
06239 Elec 2
06552 Elec 2
06578 Elec 2
10320 Elec 2

select top 1 with ties *
from TableA
where ALERT_DATETIME >= GETDATE() -1
order by row_number() over (partition by POLY_CODE,ALERT_TYPE order by [ALERT_DATETIME] DESC)
The way this works is that for each group of POLY_CODE,ALERT_TYPE get their own row_number() starting from the most recent alert_datetime. Then, the with ties clause ensures that all rows(= all groups) with the row_number value of 1 get returned.

One way of doing it is creating a cte with the grouping that calculates the latesdatetime for each and then crosses it with the table to get the results. Just keep in mind that if there are more than one record with the same combination of poly_code, alert_type, alert_level and datetime they will all show.
WITH list AS (
SELECT ta.poly_code,ta.alert_type,MAX(ta.alert_datetime) AS LatestDatetime,
ta.alert_level
FROM dbo.TableA AS ta
WHERE ta.alert_datetime >= DATEADD(DAY,-1,GETDATE())
GROUP BY ta.poly_code, ta.alert_type,ta.alert_level
)
SELECT ta.*
FROM list AS l
INNER JOIN dbo.TableA AS ta ON ta.alert_level = l.alert_level AND ta.alert_type = l.alert_type AND ta.poly_code = l.poly_code AND ta.alert_datetime = l.LatestDatetime

Calculated summary field based on child table

I have two tables, Order and OrderItem. There is a one-to-many relationship on Order.Order_ID=OrderItem.Order_ID
I want a query to return a list showing the status of each Order, COMPLETE or INCOMPLETE.
A COMPLETE Order is defined as one where all the related OrderItem records have a non-NULL, non-empty value in the OrderItem.Delivery_ID field.
This is what I have so far:
SELECT Order.Order_ID, 'INCOMPLETE' AS Order_status
FROM Order
WHERE EXISTS
(SELECT *
FROM OrderItem
WHERE OrderItem.Order_ID=Order.Order_ID
AND (OrderItem.Delivery_ID IS NULL OR OrderItem.Delivery_ID=''))
UNION
SELECT Order.Order_ID, 'COMPLETE' AS Order_status
FROM Order
WHERE NOT EXISTS
(SELECT *
FROM OrderItem
WHERE OrderItem.Order_ID=Order.Order_ID
AND (OrderItem.Delivery_ID IS NULL OR OrderItem.Delivery_ID=''))
ORDER BY Order_ID DESC
It works, but runs a bit slow. Is there a better way?
(N.B. I've restated the problem for clarity, actual table and field names are different)

I would suggest you have a column status on your Order table and update the status to complete when all order items get delivered.
It will make simple your query to get status as well improve performance.

Put it into a subquery to try to make the case statement less confusing:
SELECT Order_ID,
CASE WHEN incomplete_count > 0 THEN 'INCOMPLETE' ELSE 'COMPLETE' END
AS Order_status
FROM ( SELECT o.Order_ID
,SUM( CASE WHEN OrderItem.Delivery_ID IS NULL OR OrderItem.Delivery_ID='' THEN 1 ELSE 0 END )
AS incomplete_count
FROM Order o
INNER JOIN OrderItem i ON (i.Order_ID = o.Order_ID)
GROUP by o.Order_ID
) x
ORDER BY ORder_ID DESC
The idea is to keep a counter every time you encounter a null item. If the sum is 0, there were no empty order items.

Try this one -
SELECT
o.Order_ID
, Order_status =
CASE WHEN ot.Order_ID IS NULL
THEN 'COMPLETE'
ELSE 'INCOMPLETE'
END
FROM dbo.[Order] o
LEFT JOIN (
SELECT DISTINCT ot.Order_ID
FROM dbo.OrderItem ot
WHERE ISNULL(ot.Delivery_ID, '') = ''
) ot ON ot.Order_ID = o.Order_ID

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Can someone help me make this SQL query more efficient? - sql-server

Related

Optimise query with count and order by function

Display of online users on the system

Getting more than two counts in one query

SQL Query Get Last record Group by multiple fields

Calculated summary field based on child table

Categories

Resources