Adding column to select statement brings in all historical data - sql-server

Good evening all!
I'm running into a really odd issue that I'm having trouble understanding.
I have 3 tables (parts table, parts move history and a parts detail table).
What I'm trying to do is have the result set return lot#,part#,product description,quantity,part location, what's currently in inventory (versus full history) and who last moved the product.
Now, for the query. When I run the below query, I get a result set of 4,751 rows; which lines up perfectly with my expected results. However, when I try to add in the userid field, I then get a result set of 186,573. This large result set appears to pull in all historic data versus just matching the userid to the 4,751 rows I actually need.
From the Parts Table I need (prod_desc)
From the Parts Detail Table I need (lot,part#,lotquantity,prtlocation)
From the Parts Move History Table I need (move_date,user_id)
4,751 Query:
MAX(mv.move_date)AS 'Move Date'
FROM invdet AS inv
LEFT JOIN movetable AS mv ON inv.part# = mv.part#
LEFT JOIN partmstr AS prt ON inv.part# = prt.part#
GROUP BY inv.lot,inv.part#,prt.prod_desc,inv.lotquantity,inv.prtlocation
ORDER BY inv.prtlocation
186,573 Query:
MAX(mv.move_date)AS 'Move Date'
FROM invdet AS inv
LEFT JOIN movetable AS mv ON inv.part# = mv.part#
LEFT JOIN partmstr AS prt ON inv.part# = prt.part#
GROUP BY inv.lot,inv.part#,prt.prod_desc,inv.lotquantity,inv.prtlocation,mv.user_id
ORDER BY inv.prtlocation
If I don't use the MAX function, I do not get current inventory and instead get all results in the table, which I do not need. I'm still learning and my GROUP BY's leave a lot to be desired as I'm still wrapping my head around it (open to suggestions!). I'm sure there's a subquery I can throw in here somewhere, but I'm still figuring those out as well. Any help is greatly appreciated!

I think the problem is that when you insert mv.user_id from table movetable you get all part's movements and not only the last one with date max(mv.move_date).
One way is to remove the left join to movetable and use maybe a cross apply like
SELECT inv.lot,inv.part,prt.prod_desc,inv.lotquantity,inv.prtlocation,x.move_date,x.user_id
FROM invdet AS inv
FROM movetable mv
WHERE inv.part=mv.part
ORDER BY mv.move_date DESC) AS x
LEFT JOIN partmstr AS prt ON inv.part=prt.part
ORDER BY inv.prtlocation
I've not tested it but should be fine, maybe a bit slow because cross apply executes one subquery per each row in inv table. If it is too slow, you can user ROWNUMBER to create a table composed of only the last movements and then use it in the LEFT JOIN, as follows
SELECT inv.lot,inv.part,prt.prod_desc,inv.lotquantity,inv.prtlocation,y.move_date,y.user_id
FROM invdet AS inv
LEFT JOIN(SELECT x.user_id,x.move_date,x.part
FROM (SELECT mv.user_id,mv.move_date,mv.part,rn=ROWNUMBER() OVER(PARTITION BY mv.part ORDER BY mv.move_date DESC)
FROM movetable mv) AS x
WHERE x.rn=1) AS y ON y.part=inv.part
LEFT JOIN partmstr AS prt ON inv.part=prt.part
ORDER BY inv.prtlocation
Hope it helps.


SQL - join two tables based on up-to-date entries

I have two tables
1- Table of TestModules
2- Table of TestModule_Results
in order to get the required information for each TestModule, I am using FULL OUTER JOIN and it works fine.
But what is required is slightly different. The above picture shows that TestModuleID = 5 is listed twice, and the requirement is to list the 'up-to-date' results based on time 'ChangedAt'
Of course, I can do the following:
SELECT TOP 1 * FROM TestModule_Results
WHERE DeviceID = 'xxx' and TestModuleID = 'yyy'
But this solution is for a single row and I want to do it in a Stored Procedure.
Expected output should be like:
Any advise how can I implement it in a SP?
Use a Common Table Expression and Row_Number to add a field identifying the newest results, if any, and select for just those
--NOTE: a Common Table Expression requires the previous command
--to be explicitly terminiated, prepending a ; covers that
;WITH cteTR as (
ORDER BY ChangedAt DESC) AS ResultOrder
FROM TestModule_Results
--cteTR is now just like TestModule_Results but has an
--additional field ResultOrder that is 1 for the newest,
--2 for the second newest, etc. for every unique (DeviceID,TestModuleID) pair
FROM TestModules as M --Use INNER JOIN to get only modules with results,
--or LEFT OUTER JOIN to include modules without any results yet
ON M.DeviceID = R.DeviceID AND M.TestModuleID = R.TestModuleID
WHERE R.ResultOrder = 1
-- OR R.ResultOrder IS NULL --add if Left Outer Join
You say "this solution is for a single row"? Excellent. Use CROSS APPLY and change the WHERE clause from hand-input literal to the fields of the original table. APPLY operates at row level.
FROM TestModules t
SELECT TOP 1 * FROM TestModule_Results
WHERE TestModule_Results.DeviceID = TestModules.DeviceID -- put the connecting fields here

SQL Left Outer Join?

I have table that should joint to another table based on the unique id. If I do LEFT OUTER JOIN ON I will have duplicates. If I put DISTINCT in my SELECT I will get correct number of records. Then if I include any field from the table that I did LEFT OUTER JOIN in that case I'm getting duplicates again. Here is my query:
FROM Table1
ON Table2.user_id = Table1.userid
In the example above I'm getting duplicates, also I have tried to do:
SELECT user_id
FROM Table2
GROUP BY user_id
) AS t2 ON Table1.user_id = t2.user_id
This gave me correct number of records but I need some additional columns from that second table, after I include extra columns I'm getting duplicates again, example:
SELECT user_id, address
FROM Table2
GROUP BY user_id, address
) AS t2 ON Table1.user_id = t2.user_id
I'm wondering if I missed something or there is better way to handle this type of problem. If anyone see something or know better solution please let me know.
It is impossible for you to pick the correct answer here without understanding your data.
It seems that Table2 supports multiple addresses per user_id. This is a common design. If you want to return only one address per user_id you have several options:
Fix the data - Remove the duplicate addresses from table 2 and add a constraint that prevents this situation again. You will need to determine which addresses are incorrect.
Reduce the left join to only include one address per user - How you do this will depend on your other data. You could use min() or max() with a group by if you don't care which one to return where there are multiples or you will need to perhaps order by an effective date and take the latest one - or maybe there are billing and shipping addresses and you should pick the correct one.
Accept that there are multiple addresses per user - this may be correct - and adjust the rest of your code.

Filling up a table in sql with data from another table only if it does not already exist

I am working on a problem in SQL Server that is mind boggling. What I am trying to accomplish is, I have a table temp2 (picture below) that houses data from a lot of inner joins which is then used for a SSRS report.
The problem I am trying to solve is, how can I fill in the missing titles for each employee even if they have not put any values in it for the dates provided?
Question is, is it possible to fill in the missing titles from ProjectName for each Employee? As seen in the SSRS report, each employee should have all of the ProjectName being returned from the data set which is reading the table temp2...
So This is what I tried and even Though I have gotten all the projectnames into my temp2, this is ugly and inefficient. The ssrs will take too long to run because of unwanted data.
Select distinct Employee = Coalesce(a.Employee, #SelectEmployee), EmpId = Coalesce(a.EmpId, (Select PkId from AllRef Where Ness='All')), c.Day, Title=Coalesce((case when a.Title like '%-%'
then left(a.Title, charindex('-', a.Title))
else a.Title
end),''), p.ProjectName, Description =coalesce(a.Description,''), Val = Coalesce(a.Val,''), AbbrevJob = COALESCE(a.abbrevjob, ''),
week1Total=(select sum(val) as week1 from temp1 WHERE day >= Dateadd("d", -14, #WeekEnding) AND day <= Dateadd("d", -7, #WeekEnding)),
week2Total=(select sum(val) as week2 from temp1 WHERE day >= Dateadd("d", -7, #WeekEnding) AND day <= #WeekEnding )
from dbo.Calender as c
left outer join temp2 as a
on c.Day = a.Day
cross join ProjectName p
--on p.PkId = a.Abbrevjob-2
Where c.Day >= Dateadd("d",-13,#WeekEnding) and c.Day <= #WeekEnding
order by EmpId asc
The Cross Join did accomplish the task but the repetition is killing performance. Anyone knows how to deal with that?
The usual way to do that is to build a matrix of all employees and all projects, and then optionally query the hours. For example:
; with Employees as
select distinct EmployeeName
from TableWithEmployees
, Projects as
select distinct ProjectName
from TableWithProjects
select *
from Employees e
cross join
Projects p
left join
TableWithDetails d
on d.EmployeeName = e.EmployeeName
and d.ProjectName = p.ProjectName
The left join means that rows without details will not be filtered out.
Though I never done it before, solution that comes upon my mind is pretty simple...
The problem with that report is that you try to inner join data with the project name table. It's not actually the problem, it's the right approach that everyone uses when they need such type of report.
But when it comes to the problem you stated, then it is the problem...
The point of getting the whole list of project names is outer joining project name table. But in that case you still get only projects per employee and if there is any project name not filled by employee record, it will still appear on the list, but only once and not assigned to any employee. And that's not what you need.
So solution is to outer join employees and project names first, and then inner join your data table.
You can just pick DISTINCT empId values from your data table and FULL OUTER JOIN it on ProjectNames. Then LEFT OUTER JOIN results to data table, this time by empId = empId and PkId = AbbrevJob (if I got right the columns which hold project name id-s).
It should work, let me know please whether it does or not, good luck with that!

Is it possible to perform a join in Access on a second column if the first is blank?

I have this ugly source data with two columns, let's call them EmpID and SomeCode. Generally EmpID maps to the EmployeeListing table. But sometimes, people are entering the Employee IDs in the SomeCode field.
The person previously running this report in Excel 'solved' this problem by performing multiple vlookups with if statements, as well as running some manual checks to ensure results were accurate. As I'm moving these files to Access I am not sure how best to handle this scenario.
Ideally, I'm hoping to tell my queries to do a Left Join on SomeCode if EmpID is null, otherwise Left Join on EmpID
Unfortunately, there's no way for me to force validation or anything of the sort in the source data.
Here's the full SQL query I'm working on:
SELECT DDATransMaster.Fulfillment,
NZ([DDATransMaster]![DirectSellerNumber],[DDATransMaster]![PromotionCode]) AS EmpJoin,
EmployeeLookup.ID AS EmpLookup,
LEFT JOIN EmployeeLookup ON NZ([DDATransMaster]![DirectSellerNumber],[DDATransMaster]![PromotionCode]) = EmployeeLookup.[Employee #])
You can create a query like this:
IIf(EmpID Is Null, SomeCode, EmpID) AS join_field,
FROM YourTable
Or if the query will always be used within an Access session, Nz is more concise.
Nz(EmpID, SomeCode) AS join_field,
FROM YourTable
When you join that query to your other table, the Access query designer can represent the join between join_field and some matching field in the other table. If you were to attempt the IIf or Nz as part of the join's ON clause, the query designer can't display the join correctly in Design View --- it could still work, but may not be as convenient if you're new to Access SQL.
See whether this SQL gives you what you want.
NZ(dda.DirectSellerNumber,dda.PromotionCode) AS EmpJoin,
el.ID AS EmpLookup
DDATransMaster AS dda
LEFT JOIN EmployeeLookup AS el
ON NZ(dda.DirectSellerNumber,dda.PromotionCode) = el.[Employee #])
But I would use the Nz part in a subquery.
el.ID AS EmpLookup
NZ(DirectSellerNumber,PromotionCode) AS EmpJoin
FROM DDATransMaster
) AS sub
LEFT JOIN EmployeeLookup AS el
ON sub.EmpJoin = el.[Employee #])
What about:
LEFT JOIN EmployeeListing ON NZ(EmpID, SomeCode)
as your join, nz() uses the second parameter if the first is null, I'm not 100% sure this sort of join works in access. Worth 20 seconds to try though.
Hope it works.
You Could use a Union:
SELECT DDATransMaster.Fulfillment,
EmployeeLookup.ID AS EmpLookup
FROM DDATransMaster
LEFT JOIN EmployeeLookup ON
DDATransMaster.DirectSellerNumber = EmployeeLookup.[Employee #]
where DDATransMaster.DirectSellerNumber IS NOT NULL
SELECT DDATransMaster.Fulfillment,
EmployeeLookup.ID AS EmpLookup
FROM DDATransMaster
LEFT JOIN EmployeeLookup ON
DDATransMaster.PromotionCode = EmployeeLookup.[Employee #]
where DDATransMaster.DirectSellerNumber IS NULL;

How to improve SQL Query Performance

I have the following DB Structure (simplified):
Id | int
InvoiceId | int
Active | bit
Processed | bit
Id | int
CustomerOrderId | int
Id | int
ApprovalDate | DateTime
ExternalStoreOrderNumber | nvarchar
Each Customer Order has an Invoice and each Invoice can have multiple Payments.
The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened.
Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic:
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit.
To do that I created the following query:
UPDATE DB.dbo.Payments
SET InvoiceId=
(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve...
You might as well kill the query. Your update subquery is completely un-correlated to the table being updated. From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value.
To break down your query, you might find that the subquery runs fine on its own.
SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000))
That is always a BIG worry.
The next thing is that it is running this row-by-row for every record in the table.
You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. You can reference a table for update in the JOIN clause using this pattern:
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Moving on, another mistake is to use TOP without ORDER BY. That's asking for random results. If you know there's only one result, you wouldn't even need TOP. In this case, maybe you're ok with randomly choosing one from many possible matches. Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. That would make it look like this
SET InvoiceId=
FROM DB.dbo.Invoices AS I
JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId=O.Id
JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
AND OO.Id=I.CustomerOrderId)
However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries.
Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit. However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000.
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId
-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, e.g. Oracle. This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. You will get a random possible II.Id.
I think something like this will be more efficient ,if I understood your query right. As i wrote it by hand and didn't run it, it may has some syntax error.
UPDATE DB.dbo.Payments
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id
inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId
and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments
JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and
AND Payments.Processed=0
AND O.ApprovalDate='2012-07-19 00:00:00'
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
Try to re-write using JOINs. This will highlight some of the problems. Will the following function do just the same? (The queries are somewhat different, but I guess this is roughly what you're trying to do)
UPDATE Payments
SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')
As you see, two problems stand out:
The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. Will be in mine though :).
The join on a 10000-character column (SUBSTRING), embodied in a condition. This is highly inefficient.
If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update.
