SQL table join to find matching data

SQL table join to find matching data - sql-server

We are in the healthcare business and we have a SQL database where we store the indications (how much care should the client receive in minutes) and the appointments (how much care have we delivered to the client in minutes).
The query i wrote is as follow:
SELECT
CF.ClientNr,
C.Name,
F.Description AS Function,
CF.Time AS Indication,
SUM(P.Minutes) As Production,
C.Insurance,
V.Name AS Team
FROM ClientFunctie AS CF
INNER JOIN Client AS C ON C.ClientNr=CF.ClientNr
INNER JOIN HoofdaanbiederFunctie AS H ON CF.HoofdaanbiederFunctieNr=H.HoofdaanbiederFunctieNr
INNER JOIN Functie AS F ON H.FunctieNr=F.FunctieNr
LEFT JOIN Planning AS P ON CF.ClientFunctieNr=P.ClientFunctieNr
LEFT JOIN DeclaratieAfhandeling AS D ON P.DeclaratieAfhandelingNr=D.DeclaratieAfhandelingNr
LEFT JOIN Vestiging AS V ON P.VestigingNr=V.VestigingNr
WHERE
D.StatusNr = 2 AND
CF.Einddatum > '2018-01-01' /*startdatum*/ AND
CF.Startdatum < '2018-01-28' /*einddatum*/ AND
CF.IsAkkoord = 1 AND
F.Omschrijving LIKE '%mwa%' AND
F.Omschrijving LIKE '%hv%'
GROUP BY
CF.ClientNr,
C.Name,
F.Description AS Function,
CF.Time AS Indication,
C.Insurance,
V.Name AS Team
My goal is to know how many minutes per indications have been delivered and how many indications have received 0 minutes.
Right now i only see the indications in my results with minutes but i also want to see which indications have 0 minutes. I tried to play with the left/right join but im right now out of options. I would really appreciate any help.

It is bit hard without table structure but here how I usually go.
I think I would extract the calculations in a subquery and edit the main query in consequence, something like that:
SELECT CF.ClientNr,
C.Name,
F.Description AS Function,
CF.Time AS Indication,
WL-Production As Production,
C.Insurance,
V.Name AS Team
FROM [...]
LEFT JOIN ( SELECT SUM(P.Minutes) As Production, CF.ClientNr as ClientNr
FROM ClientFunctie AS CF
INNER JOIN Planning AS P ON CF.ClientFunctieNr = P.ClientFunctieNr
INNER JOIN DeclaratieAfhandeling AS D ON P.DeclaratieAfhandelingNr = D.DeclaratieAfhandelingNr
WHERE D.StatusNr = 2
GROUP BY CF.ClientNr) AS WL ON CF.ClientFunctieNr = WL.ClientNr
(I do not have a SqlServer to test it out hopefully that will help)
You could use the same approach for counting the 0 minutes indications.
Also looking at your query it seems like you also need the name of the need Team who did the job, your queries might need some adjustments if you need the the total of time no matter which Team did the job.

Related

SQL Server slow query when adding third join, no matter what that join is

I have a SQL query as follows - pretty straight forward.
SELECT
p.ProgressClaimID,
min(p.ClaimDate) as ClaimDate,
min(p.PClaimValue) as PClaimValue,
sum(d.total) as TotalDaycost,
sum(i.amount) as TotalInvoice,
sum(round(pcd.QtyClaimed * pcd.SellRate,2)) as SellClaim
FROM
(ProgressClaim as p
LEFT JOIN
ProgressClaimDetail as pcd ON p.ProgressClaimID = pcd.ProgressClaimID)
LEFT JOIN
[DayCost] as d ON p.ProgressClaimID = d.ReportPeriod
LEFT JOIN
Invoice as i ON p.ProgressClaimID = i.ReportPeriod
WHERE
p.projectID = 4
GROUP BY
p.ProgressClaimID
But it is running very slowly (a couple of seconds) with very few rows (a few hundred at most) in SQL Server 2014. To make it more strange, this query runs as expected (pretty much instant) on my identical data on a SQL Server CE database.
In the SQL Server install, if I take out any join - it runs as expected with the remaining 3 tables - regardless of which one is removed.
I have checked FK, indexes etc. Nothing seems obvious. Any pointers appreciated.
***Edit
Execution plan at http://textuploader.com/5eurg (XML)

The problem is in my query design. This is what happens when you have been writing SQL for a while. Essentially the query takes a long time because each left join makes subsequent multiple copies of the rows which are then summed - there is a lot of computation going on by the third join.
I rewrote my query to do what I meant it to do and it is all good.
Embarassing mistake - here it is for the curious
SELECT p.ProgressClaimID, p.ClaimDate, p.PClaimValue,
pcd.pcdTotal,
d.TotalDaycost,
i.TotalInvoice
FROM ProgressClaim as p
left join (select ProgressClaimID, sum(round(QtyClaimed * SellRate,2)) as pcdTotal from ProgressClaimDetail group by ProgressClaimID) as pcd on p.ProgressClaimID=pcd.ProgressClaimID
left join (select ReportPeriod, sum(total) as TotalDaycost from DayCost group by ReportPeriod) as d on p.ProgressClaimID= d.ReportPeriod
left join (select ReportPeriod, sum(amount) as TotalInvoice from Invoice group by ReportPeriod) as i on p.ProgressClaimID= i.ReportPeriod
where p.projectID=4

SQL Server left join on recent exchange rate

I ran into a problem with my T-SQL query. I have warehouse database, and I want to add currency exchange rates to my transactions, to see them in EUR and USD.
To do that, I am using Europe Central bank currency rates.
My query looks like this:
SELECT
Companys.Companys_name,
Warehouse_oper.Pavad_num,
[Items]![Quantity]*[Items]![Price] AS Expr1,
[Items]![Quantity]*[Items]![Price]*[Exchange_rates]![USD] AS Expr2
FROM
(Companys
LEFT JOIN
(Exchange_rates
RIGHT JOIN
Warehouse_oper ON Exchange_rates.Date = Warehouse_oper.Date)
ON Companys.Companys_num_d_b = Warehouse_oper.Companys_nr_d_b)
LEFT JOIN
Items ON Warehouse_oper.Warehouse_oper_num_d_b = Items.Warehouse_oper_num_d_b;
Sorry if its hard to understand, because I translated all variables to English.
Anyways this query works fine, if LEFT JOIN (Exchange_rates RIGHT JOIN Warehouse_oper ON Exchange_rates.Date = Warehouse_oper.Date, but bank does not provide them on holidays, so on those dates I have NULL values.
How can I edit this query (I know it's messy, but it's from Access) to SELECT most recent available date?
I tried:
LEFT JOIN
(Exchange_rates RIGHT JOIN Warehouse_oper ON
(SELECT TOP 1 Exchange_rates.Date FROM Exchange_rates WHERE
Exchange_rates.Date <= Warehouse_oper.Date) = Warehouse_oper.Date)
but with no success.

How can I perform a conditional join in mssql?

I want to join a table to one of two possible tables, depending on data. Here's an attempt that did not work, but gets the idea across, I hope. Also, this is a mocked up example that may not be very realistic, so don't get too hung up on the idea this is representing real students and classes.
SELECT *
FROM
student
INNER JOIN class ON class.student_id = student.student_id
CASE
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
END
Any ideas?

You have two join conditions and if either are true you want to commit a join - That's a boolean OR operation.
You simply need to:
RIGHT OUTER JOIN report ON (CONDITION1) OR (CONDITION2)
Let's unravel that a moment though, what is condition 1 and what is condition 2?
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
Here you're putting together two conditions on each of your condition 1 and 2, so your condition 1 is:
class.complete = 0 AND report.label_id = inprogress.class_id
and your condition 2 is
class.complete = 1 AND report.label_id = completed.class_id
So the completed SQL should be something like (and this is untested off the top of my head)
RIGHT OUTER JOIN report ON (
class.complete = 0 AND report.label_id = inprogress.class_id
) OR (
class.complete = 1 AND report.label_id = completed.class_id
)
Worth mentioning..
I haven't run the above join but I know from experience the execution plan on that will be absolutely abominable, won't matter if performance isn't important and or your data set is small, but if the performance matters I strongly suggest you post a broader scope of what you want here and we can talk about a better approach to getting your particular data set that won't perform so terribly. I would personally write a join like above only as a last resort or if I was hacking something truly irrelevant.

try this (untested) -
SELECT *
FROM student S
INNER JOIN class c ON c.student_id = S.student_id
left outer join report ir on ir.label_id = inprogress.class_id AND c.complete=0
left outer join report cr on cr.label_id = completed.class_id AND c.complete=1

If you want to join to either of 2 tables with reasonable performance, write a stored procedure for each path (one that joins table 1 to table A, one that joins table 1 to table B). Make a third stored procedure that calls either the 1-A stored procedure or the 1-B stored procedure. This way an efficient query plan will be performed in each case without having to make it recompile on each call or generate a strange query plan.
In your case, you actually want some records from both of the tables you might join to. In that case, you want to union the results together rather than pick one or the other (and you can combine them in one sproc if you want to, it shouldn't hurt the query plan). If you are sure the records won't duplicate between the two queries (it seems like they wouldn't), then as usual use UNION ALL for performance.

Right Join within a Left Join in SQL Server Query

I'm in the process of converting some MS Access queries into Transact-SQL format and have run into some problems. Is there a way to write a Join within a Join?
For example:
LEFT JOIN (TaxInfo RIGHT JOIN TaxInfoJackpot
ON TaxInfo.RefNumber = TaxInfoJackpot.RefNumber)
ON HandPay.SlipNumber = TaxInfoJackpot.SlipNumber
This is just a snapshot of a much larger query of course. But, if anyone knows if this is possible any help would be great.
Thanks in advance.

I tend to like all of my joins to be sequential and flowing in the same direction, when possible (and I try to always re-order things so it is possible). LEFT JOIN / RIGHT JOIN / ON / ON is very confusing to follow for anyone, myself included, and I've been doing this for a very long time. Access certainly doesn't do anyone any favors with the bizarre syntax it pumps out (and accepts).
I am not sure if the current syntax provides the results you expect, but can you compare to this format to see if they're the same? Hard to know for sure without sample data and desired results.
SELECT ...
FROM dbo.TaxInfoJackPot AS jp
LEFT OUTER JOIN dbo.HandPay AS hp
ON hp.SlipNumber = jp.SlipNumber
LEFT OUTER JOIN dbo.TaxInfo AS ti
ON jp.RefNumber = ti.RefNumber;

You can do this with a subquery.
LEFT JOIN (
SELECT *
FROM TaxInfo ti
RIGHT JOIN TaxInfoJackpot j ON ti.RefNumber = j.RefNumber
) tij ON HandPay.SlipNumber = tij.SlipNumber
But I'm not sure if you actually need to do it this way. I think you can do this with just normal joins
FROM HandPay h
RIGHT JOIN TaxInfoJackpot j ON h.SlipNumber = j.SlipNumber
LEFT JOIN TaxInfo ti ON j.RefNumber = ti.RefNumber;

How to improve SQL Query Performance

I have the following DB Structure (simplified):
Payments
----------------------
Id | int
InvoiceId | int
Active | bit
Processed | bit
Invoices
----------------------
Id | int
CustomerOrderId | int
CustomerOrders
------------------------------------
Id | int
ApprovalDate | DateTime
ExternalStoreOrderNumber | nvarchar
Each Customer Order has an Invoice and each Invoice can have multiple Payments.
The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened.
Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic:
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit.
To do that I created the following query:
UPDATE DB.dbo.Payments
SET InvoiceId=
(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
WHERE Id IN (
SELECT P.Id
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve...

You might as well kill the query. Your update subquery is completely un-correlated to the table being updated. From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value.
To break down your query, you might find that the subquery runs fine on its own.
SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000))
That is always a BIG worry.
The next thing is that it is running this row-by-row for every record in the table.
You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. You can reference a table for update in the JOIN clause using this pattern:
UPDATE P
....
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Moving on, another mistake is to use TOP without ORDER BY. That's asking for random results. If you know there's only one result, you wouldn't even need TOP. In this case, maybe you're ok with randomly choosing one from many possible matches. Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. That would make it look like this
SET InvoiceId=
(SELECT TOP 1 I.Id
FROM DB.dbo.Invoices AS I
JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId=O.Id
JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
AND OO.Id=I.CustomerOrderId)
However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries.
Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit. However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000.
UPDATE P
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId
-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, e.g. Oracle. This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. You will get a random possible II.Id.

I think something like this will be more efficient ,if I understood your query right. As i wrote it by hand and didn't run it, it may has some syntax error.
UPDATE DB.dbo.Payments
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id
inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId
and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments
JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and
Payments.Active=0
AND Payments.Processed=0
AND O.ApprovalDate='2012-07-19 00:00:00'
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId

Try to re-write using JOINs. This will highlight some of the problems. Will the following function do just the same? (The queries are somewhat different, but I guess this is roughly what you're trying to do)
UPDATE Payments
SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')
As you see, two problems stand out:
The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. Will be in mine though :).
The join on a 10000-character column (SUBSTRING), embodied in a condition. This is highly inefficient.
If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update.