SQL join and get all data even not exist in main table - sql-server

I need to get count for header. So I have table A which contain transaction records, and table B which contain status of the transaction. I want to accumulate the data to show table A result count based from table B. But currently I still cannot get the empty data too.
I have create a SQLFiddle here.
I want all the status is display with zero when data from table A is not there.
As you can see from the image above, status with TSID: 4 (Complete) is not listed with zero.
I have tried some script
SELECT
COUNT(CASE WHEN A.Status = 0 THEN 1 END) AS Pending,
COUNT(CASE WHEN A.Status = 1 THEN 1 END) AS Assigned,
COUNT(CASE WHEN A.Status = 2 THEN 1 END) AS Started,
.
.
.
.
But I don't want this way because on client, I use a loop and render display based from the result return from SQL.
I have try LEFT JOIN, LEFT OUTER JOIN but not as expected result. I'm not much good with SQL join part, frankly speaking.

You need a LEFT JOIN, but you must flip around the order of the tables
SELECT
TS.TSID,
TS.TSName,
COUNT(T.Status) as Total
FROM B TS
LEFT JOIN A T ON T.Status = TS.TSID
GROUP BY
TS.TSID,
TS.TSName
ORDER BY
TS.TSID ASC;
SQL Fiddle
You can also do this as a RIGHT JOIN with the original order, however this can cause confusion especially in the presence of other joins. And because joins are commutative, you can just swap them around. This is why right joins are very rarely used in practice.
Note also that the grouping is over TS.TSID as T.Status may be null in some cases.

When you want all the records from the second table in the join, even if there is no match in the first table you need RIGHT JOIN. I have also removed MAX(T.status) from ORDER BY because it does not produce the result you want.
SELECT TS.TSID,TS.TSName,
COUNT(T.Status) as Total
FROM A T
RIGHT JOIN B TS
ON Status=TS.TSID
GROUP BY T.Status,
TS.TSID,TS.TSName
ORDER BY
MAX(TS.TSID) ASC;

Related

Adding column to select statement brings in all historical data

Good evening all!
I'm running into a really odd issue that I'm having trouble understanding.
I have 3 tables (parts table, parts move history and a parts detail table).
What I'm trying to do is have the result set return lot#,part#,product description,quantity,part location, what's currently in inventory (versus full history) and who last moved the product.
Now, for the query. When I run the below query, I get a result set of 4,751 rows; which lines up perfectly with my expected results. However, when I try to add in the userid field, I then get a result set of 186,573. This large result set appears to pull in all historic data versus just matching the userid to the 4,751 rows I actually need.
From the Parts Table I need (prod_desc)
From the Parts Detail Table I need (lot,part#,lotquantity,prtlocation)
From the Parts Move History Table I need (move_date,user_id)
4,751 Query:
SELECT DISTINCT
inv.lot,
inv.part#,
prt.prod_desc,
inv.lotquantity,
inv.prtlocation,
MAX(mv.move_date)AS 'Move Date'
FROM invdet AS inv
LEFT JOIN movetable AS mv ON inv.part# = mv.part#
LEFT JOIN partmstr AS prt ON inv.part# = prt.part#
WHERE inv.lot IS NOT NULL
GROUP BY inv.lot,inv.part#,prt.prod_desc,inv.lotquantity,inv.prtlocation
ORDER BY inv.prtlocation
186,573 Query:
SELECT DISTINCT
inv.lot,
inv.part#,
prt.prod_desc,
inv.lotquantity,
inv.prtlocation,
MAX(mv.move_date)AS 'Move Date'
mv.user_id
FROM invdet AS inv
LEFT JOIN movetable AS mv ON inv.part# = mv.part#
LEFT JOIN partmstr AS prt ON inv.part# = prt.part#
WHERE inv.lot IS NOT NULL
GROUP BY inv.lot,inv.part#,prt.prod_desc,inv.lotquantity,inv.prtlocation,mv.user_id
ORDER BY inv.prtlocation
If I don't use the MAX function, I do not get current inventory and instead get all results in the table, which I do not need. I'm still learning and my GROUP BY's leave a lot to be desired as I'm still wrapping my head around it (open to suggestions!). I'm sure there's a subquery I can throw in here somewhere, but I'm still figuring those out as well. Any help is greatly appreciated!
I think the problem is that when you insert mv.user_id from table movetable you get all part's movements and not only the last one with date max(mv.move_date).
One way is to remove the left join to movetable and use maybe a cross apply like
SELECT inv.lot,inv.part,prt.prod_desc,inv.lotquantity,inv.prtlocation,x.move_date,x.user_id
FROM invdet AS inv
CROSS APPLY(SELECT TOP 1
mv.user_id,mv.move_date
FROM movetable mv
WHERE inv.part=mv.part
ORDER BY mv.move_date DESC) AS x
LEFT JOIN partmstr AS prt ON inv.part=prt.part
WHERE inv.lot IS NOT NULL
ORDER BY inv.prtlocation
I've not tested it but should be fine, maybe a bit slow because cross apply executes one subquery per each row in inv table. If it is too slow, you can user ROWNUMBER to create a table composed of only the last movements and then use it in the LEFT JOIN, as follows
SELECT inv.lot,inv.part,prt.prod_desc,inv.lotquantity,inv.prtlocation,y.move_date,y.user_id
FROM invdet AS inv
LEFT JOIN(SELECT x.user_id,x.move_date,x.part
FROM (SELECT mv.user_id,mv.move_date,mv.part,rn=ROWNUMBER() OVER(PARTITION BY mv.part ORDER BY mv.move_date DESC)
FROM movetable mv) AS x
WHERE x.rn=1) AS y ON y.part=inv.part
LEFT JOIN partmstr AS prt ON inv.part=prt.part
WHERE inv.lot IS NOT NULL
ORDER BY inv.prtlocation
Hope it helps.

Get maximum price from a group by in SQL server

Im sorry if my title isnt very descriptive as i didnt know how to explain what i needed in sql code in a title.
Basically i have 2 tables, theres a submissions table which contains invoicenumber and totalexvat. (totalexvat is the sum of all the part_exvats in the livedata` table.
The livedata table contains invoicenumber, part_code and part_price
What i need to do is return all the data from the submissions table but also include ONLY the most expensive product in livedata which is the part_code and part_price joining the two tables together on invoicenumber
submissions
invoicenumber totalexvat
1 £123.00
2 £354.00
3 £453.00
livedata
invoicenumber part_code part_price
1 prt12345 £100.00
1 prt13643 £20.00
1 prt63456 £3.00
2 prt64232 £300.00
2 prt28258 £54.00
3 prt64232 £300.00
3 prt67252 £153.00
I hope i have explained it well enough and hope somebody can help me.
What I usually do is add a join condition and check that the price equals the max price for that item, using a subquery.
Something like this:
select *
from submissions s
inner join livedata l
on s.invoicenumber = l.invoicenumber
and l.part_price =
(select MAX(part_price)
from livedata
where invoicenumber = l.invoicenumber)
So, you could use a subquery to accomplish this, but there are a couple of potential problems. The subquery would look something like this:
SELECT submissions.totalexvat
,livedata.part_code
,livedata.part_price
FROM submissions
INNER JOIN
(SELECT invoicenumber
,MAX(part_price)
FROM livedata
GROUP BY invoicenumber) ld_max
ON submissions.invoicenumber = ld_max.invoicenumber
INNER JOIN livedata
ON ld_max.invoicenumber = livedata.invoicenumber
AND ld_max.part_price = livedata.part_price
In this example the "ld_max" subquery is identifying what the price for the most expensive part per invoice number. Then you go back to the livedata table and rejoin to get what the part_code is that corresponds to the price.
The potential problem with this is if you have multiple parts that have the same price, and that price is the maximum, then you are going to get both of those parts returned by the final join. If this is not the desired behaviour, (which it might be, it's not clear by the question), then you could avoid this by having a nested subquery and pulling only the top 1 result. But then you're just kind of arbitrarily taking one of the parts and excluding the other, which seems like not a great idea. It's also not a great idea because nested subqueries are by their very nature slow, so I'd watch out for those.
You should use windowing functions - ie : row_number()
select *
from
(
select submissions.totalexvat,
livedata.part_code,
livedata.part_price,
row_number() over (partition by submissions.invoicenumber order by part_price desc) rn
from
submissions
inner join livedata on submissions.invoicenumber = livedata.invoicenumber
) r
where rn = 1

SQL Server Rewriting Left Join

I was having a problem with a larger query in SQL Server which I traced back to this section of code which isn't performing as expected.
SELECT item_name,item_num,rst_units
FROM tbl_item left join tbl_sales_regional on item_num=rst_item
WHERE rst_month=7 and rst_customer='AB123'
The first table (tbl_item) has 10,000 records. The second table (tbl_sales_regional) has 83 for the shown criteria.
Instead of returning 10,000 records with 9,917 nulls, the execution plan shows SQL Server has rewritten as an inner join and consequently returns 83 results.
In order to achieve the intended result, I have to join off a subquery. Can someone provide an explanation why this doesn't work?
Not sure which fields belong where, but you seem to have some fields from tbl_sales_regional in your WHERE condition.
Move them into the ON clause:
SELECT item_name, item_num, rst_units
FROM tbl_item
LEFT JOIN
tbl_sales_regional
ON rst_item = item_num
AND rst_month = 7
AND rst_customer = 'AB123'
The WHERE clause operates on the results of the join so these conditions cannot possibly hold true for any NULL records from tbl_sales_regional returned by the join, as NULL cannot be equal to anything.
That's why the optimizer transforms your query into the inner join.
Any conditions you have in your WHERE clause are applied regardless of the left join, effectively making it an inner join.
You need to change it to:
SELECT item_name,item_num,rst_units
FROM tbl_item left join tbl_sales_regional on item_num=rst_item
AND rst_month=7 and rst_customer='AB123'

SUM from multiple tables

I have three tables:
order
orderline (has orderID)
orderactions (has orderID)
Then I need SUM(of orderline.price) and SUM(of orderactions.price) per orderID.
When I used:
SELECT order.ID, SUM(orderlines.price), SUM(orderactions.price)
FROM order
LEFT JOIN orderlines ON orderlines.orderID=order.ID
LEFT JOIN orderactions ON orderactions.orderID=order.ID
WHERE order.ID=#orderID
GROUP BY order.ID
I got a result from orderactions which a equivalent to SUM(orderactions.price)*quantity of orderlines for this order.ID
Only solution I found, which given me a right result, are sub-query for every "SUM"-table:
SELECT order.ID
, (SELECT SUM(orderlines.price) FROM orderlines WHERE orderlines.orderId=order.ID)
, (SELECT SUM(orderactions.price) FROM orderactions WHERE orderactions.orderId=order.ID)
FROM order
WHERE order.ID=#orderID
Question: is there some other (faster) solution for this, because sub-queries are slow and we try to not using them?
Since your select only runs the two very simple subqueries once each, I'd say you're pretty close to optimal for your query. There is no need to add a JOIN unless the queries are correlated in some way.
The slow part about subqueries is usually that the database may (if you're not careful how you write your query) non obviously execute them more than once. This normally happens if they depend on external values that may change per row, ie when you JOIN them in some way.
Your subqueries are straight forward enough and not using any external values other than your constant, so in your case, they only execute once each, and there is nothing in them that is simplified by adding a JOIN.
depends of your data this way can be a solution:
SELECT order.ID,
SUM(CASE WHEN orderlines.orderID IS NOT NULL THEN orderlines.price ELSE 0 END),
SUM(CASE WHEN orderactions.orderID IS NOT NULL THEN orderactions.price ELSE 0 END)
FROM order
LEFT JOIN orderlines ON orderlines.orderID=order.ID
LEFT JOIN orderactions ON orderactions.orderID=order.ID
WHERE order.ID=#orderID
GROUP BY order.ID

How to improve SQL Query Performance

I have the following DB Structure (simplified):
Payments
----------------------
Id | int
InvoiceId | int
Active | bit
Processed | bit
Invoices
----------------------
Id | int
CustomerOrderId | int
CustomerOrders
------------------------------------
Id | int
ApprovalDate | DateTime
ExternalStoreOrderNumber | nvarchar
Each Customer Order has an Invoice and each Invoice can have multiple Payments.
The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened.
Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic:
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit.
To do that I created the following query:
UPDATE DB.dbo.Payments
SET InvoiceId=
(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
WHERE Id IN (
SELECT P.Id
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve...
You might as well kill the query. Your update subquery is completely un-correlated to the table being updated. From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value.
To break down your query, you might find that the subquery runs fine on its own.
SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000))
That is always a BIG worry.
The next thing is that it is running this row-by-row for every record in the table.
You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. You can reference a table for update in the JOIN clause using this pattern:
UPDATE P
....
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Moving on, another mistake is to use TOP without ORDER BY. That's asking for random results. If you know there's only one result, you wouldn't even need TOP. In this case, maybe you're ok with randomly choosing one from many possible matches. Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. That would make it look like this
SET InvoiceId=
(SELECT TOP 1 I.Id
FROM DB.dbo.Invoices AS I
JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId=O.Id
JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
AND OO.Id=I.CustomerOrderId)
However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries.
Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit. However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000.
UPDATE P
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId
-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, e.g. Oracle. This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. You will get a random possible II.Id.
I think something like this will be more efficient ,if I understood your query right. As i wrote it by hand and didn't run it, it may has some syntax error.
UPDATE DB.dbo.Payments
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id
inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId
and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments
JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and
Payments.Active=0
AND Payments.Processed=0
AND O.ApprovalDate='2012-07-19 00:00:00'
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
Try to re-write using JOINs. This will highlight some of the problems. Will the following function do just the same? (The queries are somewhat different, but I guess this is roughly what you're trying to do)
UPDATE Payments
SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')
As you see, two problems stand out:
The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. Will be in mine though :).
The join on a 10000-character column (SUBSTRING), embodied in a condition. This is highly inefficient.
If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update.

Resources