How to self join a CTE? - sql-server

Hi This is the query i have created to fetch records from a history table This query is working fine but taking too much time to execute because it is selecting data twice from the table and the table has 20 millon or above records. So i want to optimize it. I will try to explain what this query does and what i want to achieve.
I want to select two rows for each Id (ActivityId) from the table first is where the data is min and second where the date is maximum, So i can see how much progress happened in the mean time. Now what i am doing is first selecting the data where date is min as CTE and then selecting the data where date is maximum as CTE2. I can select both rows in a single CTE but I am not able to get a single record from both rows. Like i only need the Progress and Planned Field from the maximum date row and select all the other fields (most fields are common on both rows). So how can i achieve this. and please ask for clarification if needed.
CTE:
WITH cte AS
(
SELECT Row_number() OVER (partition BY pmaph.activityid ORDER BY date) r1,
pma.planenddate AS planenddate ,
pma.planstartdate AS planstartdate,
pmaph.activityid,
pmaph.projectmilestoneactivproghist_id AS promileavtiid,
umd.uom_name AS uomname,
pmm.milestonename AS milestonename,
pma.activityname AS activityname ,
pmp.projectname AS projectname,
Replace((Rtrim(Ltrim(CONVERT(VARCHAR(12),Cast(pmaph.date AS DATETIME),106)))),' ','-') AS rdate,
Isnull(pmaph.actual_progress,0) AS actualprogress,
Isnull(pmaph.planned_progress,0) AS plannedprogress
FROM projectmilestoneactivityprogresshistory AS pmaph
LEFT JOIN dbo.pm_project AS pmp
ON pmaph.projectid=pmp.projectid
LEFT JOIN dbo.pm_activity AS pma
ON pmaph.activityid=pma.activityid
LEFT JOIN dbo.pm_milestone AS pmm
ON pmaph.milestoneid=pmm.milestoneid
LEFT JOIN dbo.uomdetail AS umd
ON pma.uom_id=umd.uom_id
WHERE pmaph.client_id=1030), cte2 AS( r2,isnull(pmaph.actual_progress,0) AS actualprogress, isnull(pmaph.planned_progress,0) AS plannedprogress, replace((rtrim(ltrim(CONVERT(varchar(12), cast(pmaph.date AS datetime),106)))),' ','-') AS rdate,pmaph.activityid FROM projectmilestoneactivityprogresshistory AS pmaph WHERE pmaph.client_id=1030)
SELECT cte2.rdate AS todate,
cte2.actualprogress AS end_actualprogress,
cte2.plannedprogress AS end_plannedprogress,
cte.actualprogress AS start_actualprogress,
cte.plannedprogress AS start_plannedprogress,
cte.rdate AS fromdate ,
cte.planenddate,
cte.planstartdate,
cte.activityid,
cte.uomname,
cte.milestonename,
cte.activityname,
cte.projectname
FROM cte
JOIN cte2
ON cte.activityid= cte2.activityid
WHERE cte.r1=1
AND cte2.r2=1

Related

How do I properly add this query into my existing query within Query Designer?

I currently have the below query written within Query Designer. I asked a question yesterday and it worked on its own but I would like to incorporate it into my existing report.
SELECT Distinct
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT Distinct
MIN(c2.Comments) AS FlaggedComments
,c2.SalesKey
FROM [SalesComment] AS c2
WHERE(c2.Comments like 'Flagged*%')
GROUP BY c2.SalesKey) ic2
ON ic2.SalesKey = i.SalesKey
LEFT JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'
My question yesterday was that I wanted a way to select only the first comment made for each sale. I have a query for selecting the flagged comments but I want both the first row and the flagged comment. They would both be pulling from the same table. This was the query provided and it worked on its own but I cant figure out how to make it work with my existing query.
SELECT a.DateTimeCommented, a.ProductNumber, a.Comments, a.SalesKey
FROM (
SELECT
DateTimeCommented, ProductNumber, Comments, SalesKey,
ROW_NUMBER() OVER(PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) a
WHERE a.RowN = 1
Thank you so much for your assistance.
You can use a combination of row-numbering and aggregation to get both the Flagged% comments, and the first comment.
You may want to change the PARTITION BY clause to suit.
DISTINCT on the outer query is probably spurious, on the inner query it definitely is, as you have GROUP BY anyway. If you are getting multiple rows, don't just throw DISTINCT at it, instead think about your joins and whether you need aggregation.
The second LEFT JOIN logically becomes an INNER JOIN due to the WHERE predicate. Perhaps that predicate should have been in the ON instead?
SELECT
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
,ic2.FirstComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT
MIN(CASE WHEN c2.RowN = 1 THEN c2.Comments) AS FirstComments
,c2.SalesKey
,MIN(CASE WHEN c2.Comments like 'Flagged*%' THEN c2.Comments) AS FlaggedComments
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) AS c2
GROUP BY c2.SalesKey
) ic2 ON ic2.SalesKey = i.SalesKey
JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'

Joins and subqueries in SQL Server

I am trying to find all continents and their most-used currency.
ContinentCode
CurrencyCode
CurrencyUsage
I am not familiar with grouping so I will be very grateful if you can give me a hint using only subqueries and joins if they can be used adequately here.
Join the countries to the continents. Then aggregate to get the number the currencies are used. Then use row_number() (or rank(), if you want to keep ties) to produce an ordinal per continent -- the more the currency is used the lesser the ordinal. Only get the rows where this ordinal equals 1.
SELECT x.continentcode,
x.currencycode,
x.currencyusage
FROM (SELECT ct.continentcode,
cy.currencycode,
count(cy.currencycode) currencyusage,
row_number() OVER (PARTITION BY ct.continentcode
ORDER BY count(cy.currencycode) DESC) rn
FROM continents ct
LEFT JOIN countries cy
ON cy.continentcode = ct.continentcode
GROUP BY ct.continentcode,
cy.currencycode) x
WHERE x.rn = 1;
And next time do not post images of tables. Instead paste the CREATE and INSERT statements to create them as text.

Max(datetime) ignore seconds

I got my sql query working but when I find a date with the same hour and minutes, I am not able to make the query works. For example:
On the column "trans_date" I can't use my query because with max(trans_date) I am getting no results, somehow the sql is ignoring the seconds.
This is my complete sql sentence:
SELECT till.code,art.description
FROM [TCPOS4].[dbo].[transactions] as tra,
TCPOS4.dbo.articles as art,[TCPOS4].[dbo].[trans_articles] as tro,
[TCPOS4].[dbo].[tills] as till,[TCPOS4].[dbo].[shops] as shop
where tra.till_id=till.id and shop.id=till.shop_id and tro.transaction_id=tra.id and
art.id=tro.article_id and tra.trans_date =(select max(trans_date)
from tcpos4.dbo.transactions as t2 where t2.till_id=tra.till_id and trans_date > '2016-10-26 00:00:0.000' and trans_date< '2016-10-27 00:00:00.000' )
group by till.code,art.description
With this query I am getting for each "code" from the 2016-10-26 to 2016-10-27 the max transaction_date, but I am not getting any information from the code "5446". I should get "TABLE CHOCOLECHE-CONGUITOS" because it's the max trans_date in the range.
Can you try with a different approach like the following?
SELECT code, description
FROM
(
SELECT till.code, art.description,
row_number() OVER (PARTITION BY till.code ORDER BY trans_date DESC) RowNum
FROM [TCPOS4].[dbo].[transactions] AS tra
LEFT JOIN [TCPOS4].[dbo].[tills] AS till
ON tra.till_id=till.id
LEFT JOIN [TCPOS4].[dbo].[shops] AS shop
ON shop.id=till.shop_id
LEFT JOIN [TCPOS4].[dbo].[trans_articles] AS tro
ON tro.transaction_id=tra.id
LEFT JOIN TCPOS4.dbo.articles AS art
ON art.id=tro.article_id
) sbt
WHERE RowNum=1
In this way you will get one result for each till.code even if you have the same exact date.
You can add more fields in the ORDER BY if needed.
EDIT: removed art.description in PARTITION BY.
EDIT 2: converted with LEFT JOIN
try
...CAST(tra.trans_date AS DATE) = (select CAST(max(trans_date) AS DATE)...

Joining a calculated field to a field in another table

I have created a variable table called Table_A which has two columns, Age and Age_Range. The Datatype for Age is integer.
The next stage is a select statement where I’m pulling the Order_Number and a calculated field from Table_B. I want to join the calculated field from Table_B with Age from Table_A, so that I can see what the range is against the calculated field and its order number.
My first attempt was:
SELECT Order_Number, DATEDIFF(DAY,Order_Date,CAST(GETDATE()AS DATE)) AS Ageing, Age_Range
FROM Table_B LEFT JOIN Table_A ON Table_B.Ageing = Table_A.Age_Range
This didn’t work and I understand why. Usually in Access, I would just build the first query with the calculated field and then build the second query joining the calculated field with the desired field from the table. I’ve been looking at sub queries and derived tables, which I believe may solve my problem, but I’m not having any luck. I know this is a basic question, but I’ve just started out with SQL.
Thanks
You cannot join like that because SELECT is executed after JOIN statement.
You can read about it here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/70efeffe-76b9-4b7e-b4a1-ba53f5d21916/order-of-execution-of-sql-queries
You can make a workaround using CROSS APPLY
SELECT Order_Number
, T.Ageing
, A.Age_Range
FROM Table_B AS B
CROSS APPLY (SELECT DATEDIFF(DAY, B.Order_Date, GETDATE())) AS T(Ageing)
LEFT JOIN Table_A AS A
ON T.Ageing = Table_A.Age_Range
If the beauty of the code is not neccesarry:
SELECT Order_Number, DATEDIFF(DAY,Order_Date,CAST(GETDATE()AS DATE)) AS Ageing, Age_Range
FROM Table_B LEFT JOIN Table_A ON DATEDIFF(DAY,Order_Date,CAST(GETDATE()AS DATE)) = Table_A.Age_Range
Otherwise use CROSS APPLY as already suggested (performance will be the same). By the way, you do not need to CAST getdate() to date, DATEDIFF will work without that, so you can easily write like that:
SELECT Order_Number, DATEDIFF(DAY,Order_Date,GETDATE()) AS Ageing, Age_Range
FROM Table_B LEFT JOIN Table_A ON DATEDIFF(DAY,Order_Date,GETDATE()) = Table_A.Age_Range

SQL Server last date

Is there an option for getting the row with the highest date without joining the same table and use max(date) ?? Is Top1 order by desc a valid option ?
I use SQL Server 2000. And performance is important.
edit:
Table1:
columns: part - partdesc
Table 2:
columns: part - cost - date
select a.part,partdesc,b.cost
left join( select cost,part
right join(select max(date),part from table2 group by part) maxdate ON maxdate.date = bb.date
from table2 bb ) b on b.part = a.part
from table1
I don't know if the code above works but that is the query I dislike. And seems to me inefficient.
Here's a somewhat simplified query based on your edit.
SELECT
a.part,
a.partdesc,
sub.cost
FROM
Table1 A
INNER JOIN
(SELECT
B.part,
cost
FROM
Table2 B
INNER JOIN
(SELECT
part,
MAX(Date) as MaxDate
FROM
Table2
GROUP BY
part) BB
ON bb.part = b.part
AND bb.maxdate = b.date) Sub
ON sub.part = a.part
The sub-sub query will hopefully run a little bit quicker than your current version since it'll run once for the entire query, not once per part value.
SELECT TOP 1 columnlist
FROM table
ORDER BY datecol DESC
is certainly a valid option, assuming that your datacols are precise enough that you get the results needed (in other words, if it's one row per day, and your date reflects that, then sure. If it's several rows per minute, you may not be precise enough).
Performance will depend on your indexing strategy and hardware.

Resources