SQL Join Tables on Closest Date BEFORE Shipped Date - sql-server

This may appear to be a repeat question but it is not because the other solutions in the forum don't work in this situation.
This is a query to our ERP database that is trying to get the final cost of goods sold total for parts. Basically the ERP makes it easy to get all the direct costs out but doesn't calculate scrap costs.
Where I'm stuck is in the sub query in the FROM section marked:
>>>>>>HELP NEEDED STARTING HERE
The sub query, as it is now written, pulls out all the shipments to our scrap vendor and gets a monthly average rate per pound, then joins to the other tables based on alloy type, month and year.
My Finance Department has told me the average is not a good solution, since some metal prices fluctuate too much or they don't sell the scrap metal in the same month the parts shipped so this won't work.
I need to get the rate we are paid for scrap metal from the closest date before the part was shipped from our facility.
I've found other examples on Stack Overflow that show ways to do this but the main tables and the sub query tables overlap so the other solutions I've seen have failed. I have commented in the code below to show and explain this.
I'm totally open to the idea I've approached this wrong. How can I make this work?
DECLARE #Date_From AS DATETIME;
DECLARE #Date_To AS DATETIME;
SET #Date_From = '2016-10-01 00:00:00.000';
SET #Date_To = GETDATE() ;
-- Start Main query
SELECT TOP 10
CCustomer.Customer_Type AS 'Industry'
,SShipper.Ship_Date AS 'Ship_Date'
,SSContainer.Serial_No AS 'Serial_No'
,PPart.Grade AS 'Alloy'
,tbl_ScrapValue.Scrap_Value_per_lb AS 'Scap_Value_per_lb'
FROM
Sales_v_Shipper_Line AS SSLine
JOIN Sales_v_Shipper AS SShipper
ON SShipper.Shipper_Key = SSLine.Shipper_Key
JOIN Part_v_Part AS PPart
ON SSLine.Part_Key = PPart.Part_Key
JOIN Common_v_Customer AS CCustomer
ON SShipper.Customer_No = CCustomer.Customer_No
-- >>>>>>HELP NEEDED STARTING HERE
-- Below is the sub query that pulls the scrap sales value per pound.
-- The key point is that both shipments to our customers of real parts,
-- and the 'shipments' of scrap metal sales come from the same tables,
-- mainly Part_v_Part and Sales_v_Shipper, because of that the other
--solutions for the 'join by closest date' in the forums don't work.
LEFT OUTER JOIN (SELECT
MONTH(SShipper.Ship_Date) AS 'Scrap_Ship_Month'
,YEAR(SShipper.SHip_Date) AS 'Scrap_Ship_Year'
,PPart.Grade AS 'Alloy'
,AVG(AARIDist.Unit_Price) AS 'Scrap_Value_per_lb'
FROM
Sales_v_Shipper AS SShipper
JOIN Sales_v_Shipper_Line AS SS_Line
ON SShipper.Shipper_Key = SS_Line.Shipper_Key
JOIN Part_v_Part AS PPart
ON SS_Line.Part_Key = PPart.Part_Key
JOIN Common_v_Customer AS CCustomer
ON SShipper.Customer_No = CCustomer.Customer_No
WHERE CCustomer.Customer_Code = 'Scrap_Vendor'
AND SSHipper.Ship_Date <= #Date_To
GROUP BY
MONTH(SShipper.Ship_Date)
,YEAR(SShipper.SHip_Date)
,PPart.Grade
) AS tbl_ScrapValue
ON PPart.Grade = tbl_ScrapValue.Alloy
AND
YEAR(SShipper.Ship_Date) = YEAR(tbl_ScrapValue.Scrap_Ship_Year)
AND
MONTH(SShipper.Ship_Date) =(tbl_ScrapValue.Scrap_Ship_Month)
--- >>>>HELP NEEDED ENDS HERE
WHERE
AND SShipper.Ship_Date >= #Date_From
AND SSHipper.Ship_Date <= #Date_To
GROUP BY
SShipper.Shipper_No
,SShipper.Ship_Date
,CCustomer.Customer_Type
,SSContainer.Quantity
,PPart.Grade
Here's a sample output from the query above, as you can see the 'Scrap_Value_per_lb' is failing:
[![Sample_Output][1]][1]
Industry Ship_Date Serial_No Alloy Scap_Value_per_lb
Material Processing 17-Oct-16 4:47:00 PM S472091 C182 NULL
Material Processing 17-Oct-16 4:47:00 PM S472210 C182 NULL
Material Processing 17-Oct-16 4:47:00 PM S472211 C182 NULL
Electronics 17-Oct-16 4:27:00 PM S436738 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463290 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463315 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463327 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463333 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463345 C180 NULL
Electronics 17-Oct-16 4:27:00 PM S463354 C180 NULL
Update
This was edited a second time #7am 10/19/2016 to simplify code further, added comments in code to clarify based on feedback from others.

So, if I understand it correctly, you want the sub query to return only ONE row (Same for every row in the outer query?)
SELECT Top 1
SShipper.Ship_Date
,PPart.Grade AS 'Alloy'
,AARIDist.Unit_Price AS 'Scrap_Value_per_lb'
FROM
Sales_v_Shipper AS SShipper
JOIN Sales_v_Shipper_Line AS SS_Line
ON SShipper.Shipper_Key = SS_Line.Shipper_Key
JOIN Part_v_Part AS PPart
ON SS_Line.Part_Key = PPart.Part_Key
JOIN Common_v_Customer AS CCustomer
ON SShipper.Customer_No = CCustomer.Customer_No
JOIN Accounting_v_AR_Invoice_Dist AS AARIDist
ON SS_Line.Shipper_Line_Key = AARIDist.Shipper_Line_Key
WHERE CCustomer.Customer_Code = 'Scrap_Value'
AND SSHipper.Ship_Date <= #Date_To
AND AARIDIst.Unit_Price < AARIDist.Quantity
AND AARIDist.Unit_Price > '0'
Order By SShipper.Ship_Date Desc
) AS tbl_SValue
If the value of the sub query should be different for each row of the outer query, then I need to know how each row in the sub query is joined to the row in the outer query

So when you sell a metal of a certain alloy, you sell everything that's available, i.e. no remainders?
Then you take the records from the buying table and join them with the next selling record for the same alloy. This can be achieved with a cross apply. Here is a query with simple tables to give you the idea what's needed:
select
year(matched_sold.sold_date),
month(matched_sold.sold_date),
sum(bought.amount * bought.price)
from bought
cross apply
(
select top 1 *
from sold
where sold.alloy = bought.alloy
and sold.sold_date > bought.bought_date
order by sold.sold_date desc
) matched_sold
group by
year(matched_sold.sold_date),
month(matched_sold.sold_date);

I don't know, whether I got your problem right. You want to join two result sets by using a JOIN on a date field, where there are no exact matches guaranteed. Possibly you could use the ROW_NUMBER function to generate partitioned row numbers including a sort and then join on the row numbers with e.g. ROW_NR = 1.
TABLE 1 TABLE 2
ROW_NR DATE ID ROW_NR DATE ID
------ ---------- -- ------ ---------- --
1 10/25/2016 1 -match- 1 10/27/2016 1
2 10/24/2016 1
3 10/20/2016 1
4 10/19/2016 1
1 10/23/2016 2 -match- 1 10/28/2016 2
2 10/15/2016 2
3 10/09/2016 2
4 10/08/2016 2
Row Numbering in Table 1:
Data for TABLE1 with TABLE1.DATE <= TABLE2.DATE
Partitioned by ID
Sorted by ID and DATE DESC
Row Numbering in Table 2:
ROW_NR is always 1
With that you can join implicitly on the data fields without an exact match. Sorry for not providing a SQL Statement.

I spent a couple of hours thinking about this question and boiled the sample code down even further and re-posted the question here.
I think it makes more sense and I thank all of you who responded and tried to help answer what I wrote. It helped but I still could not get it to work.

Related

Query Most Recent Records in MS Access Based on Date Provided in Form Field

Let me start by noting I have spent a few days searching through S.O. and have not been able to find a solution. I apologize in advance if the solution is very simple, but I am still learning and appreciate any help I can get.
I have a MS Access 2010 Database, and I am trying to create a set of queries to inform other forms and queries. There are two tables: Borrower Contact Info (BC_Info) and Basic Financial Indicators (BF_Indicators). Each month, I review and track key performance metrics of each borrower. I would like to create a query that supplies the most recent record based on a textbox input (Forms![Portfolio_Review Menu]!Text47).
Two considerations have separated this from other posts I have seen in the 'greatest-n-per-group' tag:
Not every borrower will have data for every month.
I need to be able to see back in time, i.e. if it is January 1, 2019 and I want to see the metrics as of July 31, 2017, I want to make
sure I am only seeing data from before July 31, 2017 but as close to
this date as possible.
Fields are as follows:
BC_Info
- BorrowerName
-PartnerID
BF_Indicators
-Fin_ID
-DateUpdated
The tables are connected by BorrowerName -- which is a unique naming convention used for the primary key of BC_Info.
What I currently have is:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT Fin_ID, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= Forms![Portfolio_Review Menu]!Text47 OR
Forms![Portfolio_Review Menu]!Text47 IS NULL)
GROUP BY Fin_ID
) AS Last_BF ON BFI.Fin_ID = Last_BF.Fin_ID AND
BFI.DateUpdated = Last_BF.MAX_DATE);
This gives me the fields I need, and will keep records out that are past the date given in the textbox, but will give all records from before the textbox input -- not just the most recent.
Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
MEHN-45543 19 18 9/30/2018
Expected Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
As mentioned, I am planning to use the results of this Query to generate further queries that use aggregated information from the Financial Indicators to determine portfolio quality at the time.
Please let me know if there is any other information I can provide. And again, thank you in advance.
Try joining BC_Info to a query that aggregates BF_Indicators on BorrowerName, not Fin_ID. Tested with literal date value:
SELECT BC_Info.*, MaxDate
FROM BC_Info
INNER JOIN
(SELECT BorrowerName, Max(DateUpdated) AS MaxDate
FROM BF_Indicators WHERE DateUpdated <=#12/31/2018# GROUP BY BorrowerName) AS Q1
ON BC_Info.BorrowerName=Q1.BorrowerName;
If you need to include Fin_ID in the results, then:
SELECT BC_Info.*, Fin_ID, DateUpdated FROM BC_Info
INNER JOIN
(SELECT * FROM BF_Indicators WHERE Fin_ID IN
(SELECT TOP 1 Fin_ID FROM BF_Indicators AS Dupe
WHERE Dupe.BorrowerName=BF_Indicators.BorrowerName AND DateUpdated<=#12/31/2018#
ORDER BY Dupe.DateUpdated DESC)
) AS Q1
ON BC_Info.BorrowerName = Q1.BorrowerName;
If you don't like TOP N, adjust your original query:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT BorrowerName, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= #12/31/2018#)
GROUP BY BorrowerName
) AS Last_BF ON BFI.BorrowerName = Last_BF.BorrowerName AND
BFI.DateUpdated = Last_BF.MAX_DATE);
And 1 more to think about:
SELECT BC_Info.PartnerID, BC_Info.BorrowerName, BF_Indicators.Fin_ID, BF_Indicators.DateUpdated
FROM BC_Info RIGHT JOIN BF_Indicators ON BC_Info.BorrowerName = BF_Indicators.BorrowerName
WHERE (((BF_Indicators.DateUpdated)=DMax("DateUpdated","BF_Indicators","BorrowerName='" & [BC_Info].[BorrowerName] & "' AND DateUpdated<=#12/31/2018#")));

SQL join conditional either or not both?

I have 3 tables that I'm joining and 2 variables that I'm using in one of the joins.
What I'm trying to do is figure out how to join based on either of the statements but not both.
Here's the current query:
SELECT DISTINCT
WR.Id,
CAL.Id as 'CalendarId',
T.[First Of Month],
T.[Last of Month],
WR.Supervisor,
WR.cd_Manager as [Manager], --Added to search by the Manager--
WR.[Shift] as 'ShiftId'
INTO #Workers
FROM #T T
--Calendar
RIGHT JOIN [dbo].[Calendar] CAL
ON CAL.StartDate <= T.[Last of Month]
AND CAL.EndDate >= T.[First of Month]
--Workers
--This is the problem join
RIGHT JOIN [dbo].[Worker_Filtered]WR
ON WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
or (WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullNameIN(#Manager))) --Added to search by the Manager--
AND WR.[Type] = '333E7907-EB80-4021-8CDB-5380F0EC89FF' --internal
WHERE CAL.Id = WR.Calendar
AND WR.[Shift] IS NOT NULL
What I want to do is either have the result based on the Worker_Filtered table matching the #Supervisor or (but not both) have it matching both the #Supervisor and #Manager.
The way it is now if it matches either condition it will be returned. This should be limiting the returned results to Workers that have both the Supervisor and Manager which would be a smaller data set than if they only match the Supervisor.
UPDATE
The query that I have above is part of a greater whole that pulls data for a supervisor's workers.
I want to also limit it to managers that are under a particular supervisor.
For example, if #Supervisor = John Doe and #Manager = Jane Doe and John has 9 workers 8 of which are under Jane's management then I would expect the end result to show that there are only 8 workers for each month. With the current query, it is still showing all 9 for each month.
If I change part of the RIGHT JOIN to:
WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN (#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullName IN(#Manager))
Then it just returns 12 rows of NULL.
UPDATE 2
Sorry, this has taken so long to get a sample up. I could not get SQL Fiddle to work for SQL Server 2008/2014 so I am using rextester instead:
Sample
This shows the results as 108 lines. But what I want to show is just the first 96 lines.
UPDATE 3
I have made a slight update to the Sample. this does get the results that I want. I can set #Manager to NULL and it will pull all 108 records, or I can have the correct Manager name in there and it'll only pull those that match both Supervisor and Manager.
However, I'm doing this with an IF ELSE and I was hoping to avoid doing that as it duplicates code for the insert into the Worker table.
The description of expected results in update 3 makes it all clear now, thanks. Your 'problem' join needs to be:
RIGHT JOIN Worker_Filtered wr on (wr.Supervisor in(#Supervisors)
and case when #Manager is null then 1
else case when wr.Manager in(#Manager) then 1 else 0 end
end = 1)
By the way, I don't know what you are expecting the in(#Supervisors) to achieve, but if you're hoping to supply a comma separated list of supervisors as a single string and have wr.Supervisor match any one of them then you're going to be disappointed. This query works exactly the same if you have = #Supervisors instead.

PostgreSQL - Filter column 2 results based on column 1

Forgive a novice question. I am new to postgresql.
I have a database full of transactional information. My goal is to iterate through each day since the first transaction, and show how many unique users made a purchase on that day, or in the 30 days previous to that day.
So the # of unique users on 02/01/2016 should show all unique users from 01/01/2016 through 02/01/2016. The # of unique users on 02/02/2016 should show all unique users from 01/02/2016 through 02/02/2016.
Here is a fiddle with some sample data: http://sqlfiddle.com/#!15/b3d90/1
The result should be something like this:
December 17 2014 -- 1
December 18 2014 -- 2
December 19 2014 -- 3
...
January 13 2015 -- 16
January 19 2015 -- 15
January 20 2015 -- 15
...
The best I've come up with is the following:
SELECT
to_char(S.created, 'YYYY-MM-DD') AS my_day,
COUNT(DISTINCT
CASE
WHEN S.created > S.created - INTERVAL '30 days'
THEN S.user_id
END)
FROM
transactions S
GROUP BY my_day
ORDER BY my_day;
As you can see, I have no idea how I could reference what exists in column one in order to specify what date range should be included in the filter.
Any help would be much appreciated!
I think if you do a self-join, it would give you the results you seek:
select
t1.created,
count (distinct t2.user_id)
from
transactions t1
join transactions t2 on
t2.created between t1.created - interval '30 days' and t1.created
group by
t1.created
order by
t1.created
That said, I think this is going to do form of a cartesian join in the background, so for large datasets I doubt it's very efficient. If you run into huge performance problems, there are ways to make this a lot faster... but before you address that, find out if you need to.
-- EDIT 8/20/16 --
In response to your issue with the performance of this... yes, it's a pig. I admit it. I encountered a similar issue here:
PostgreSQL Joining Between Two Values
The same concept for your example is this:
with xtrans as (
select created, created + generate_series(0, 30) as create_range, user_id
from transactions
)
select
t1.created,
count (distinct t2.user_id)
from
transactions t1
join xtrans t2 on
t2.create_range = t1.created
group by
t1.created
order by
t1.created
It's not as easy to follow, but it should yield identical results, only it will be significantly faster because it's not doing the "glorified cross join."

SQL - Reference Table has Information based on Date

So I have a reference table which stores the primary key, description and update date columns. Something like this
SELECT * FROM tblReasonRef
ReasonCode Description UpdateDate
27 Lunch 2010-12-01
24 Meeting 2010-12-01
20 SpecialProj 2010-12-01
The other day, the code description was changed. So now the query returns the following...
ReasonCode Description UpdateDate
27 Lunch 2010-12-01
24 Meeting 2010-12-01
20 SpecialProj 2010-12-01
27 Training 2012-06-22
24 Meeting 2012-06-22
20 Lunch 2012-06-22
The source data tracks every 30 minutes what state a staff member might go into, so you would have the following query...
SELECT * FROM tblhActivity
MemberID Date Time ReasonCode ReasonDuration
10922 2012-06-21 1200 27 100
10922 2012-06-21 1500 24 1800
10922 2012-06-25 1230 27 100
So originally, the query I had was...
SELECT a.MemberID, a.Date, a.Time, r.Description, a.ReasonDuration
FROM tblhActivity a
INNER JOIN tblReasonRef r ON a.ReasonCode = r.ReasonCode
Which worked fine until the change on the 22nd. Now I have two definitions of each code. The question is, how can create a query that will pick the right code depending on the date.
For example, I know that when the date is the 21st, the description for code 27 should be lunch. On the 25th, the description returned should be Training.
Keep in mind also, that this will probably happen again where codes are added to the reference table. I am trying to think the join should also be on UpdateDate but I have to know the start and end date of each reference code. Is there a simple solution?
You really need the start and end dates for the period in which a particular reason is applicable. You can either modify your tblReasonRef to include these dates (best option) or you will need to calculate them.
The following query will calculate an end date for each reason as the day before a new entry for the ReasonCode is added.
SELECT ReasonCode
,Description
,UpdateDate StartDate
,DATEADD(d, -1, UpdateDate) PreviousEntryEndDate
,ROW_NUMBER() OVER(PARTITION BY ReasonCode ORDER BY UpdateDate) AS Row
INTO #reason
FROM tblReasonRef
SELECT a.MemberID
,a.Date
,a.Time
,reason.ReasonCode
,a.ReasonDuration
FROM tblhActivity a
INNER JOIN #reason reason
ON a.ReasonCode = reason.ReasonCode
LEFT JOIN #reason nextReason
ON reason.Row = nextReason.Row - 1
AND reason.ReasonCode = nextReason.ReasonCode
WHERE a.Date BETWEEN reason.StartDate AND ISNULL(nextReason.PreviousEntryEndDate, a.Date)
DROP TABLE #reason
If you modify your table tblReasonRef, like this:
ReasonCode, Description, StarDate, EndDate
you can do this SQL Query:
SELECT a.MemberID, a.Date, a.Time, r.Description, a.ReasonDuration
FROM tblhActivity a
INNER JOIN tblReasonRef r ON a.ReasonCode = r.ReasonCode
WHERE a.Date between r.StartDate and r.EndDate
Remember that you need your code and your model simple.

T-SQL - Getting most recent date and most recent future date

Assume the table of records below
ID Name AppointmentDate
-- -------- ---------------
1 Bob 1/1/2010
1 Bob 5/1/2010
2 Henry 5/1/2010
2 Henry 8/1/2011
3 John 8/1/2011
3 John 12/1/2011
I want to retrieve the most recent appointment date by person. So I need a query that will give the following result set.
1 Bob 5/1/2010 (5/1/2010 is most recent)
2 Henry 8/1/2011 (8/1/2011 is most recent)
3 John 8/1/2011 (has 2 future dates but 8/1/2011 is most recent)
Thanks!
Assuming that where you say "most recent" you mean "closest", as in "stored date is the fewest days away from the current date and we don't care if it's before or after the current date", then this should do it (trivial debugging might be required):
SELECT ID, Name, AppointmentDate
from (select
ID
,Name
,AppointmentDate
,row_number() over (partition by ID order by abs(datediff(dd, AppointmentDate, getdate()))) Ranking
from MyTable) xx
where Ranking = 1
This usese the row_number() function from SQL 2005 and up. The subquery "orders" the data as per the specifications, and the main query picks the best fit.
Note also that:
The search is based on the current date
We're only calculating difference in days, time (hours, minutes, etc.) is ignored
If two days are equidistant (say, 2 before and 2 after), we pick one randomly
All of which could be adjusted based on your final requirements.
(Phillip beat me to the punch, and windowing functions are an excellent choice. Here's an alternative approach:)
Assuming I correctly understand your requirement as getting the date closest to the present date, whether in the past or future, consider this query:
SELECT t.Name, t.AppointmentDate
FROM
(
SELECT Name, AppointmentDate, ABS(DATEDIFF(d, GETDATE(), AppointmentDate)) AS Distance
FROM Table
) t
JOIN
(
SELECT Name, MIN(ABS(DATEDIFF(d, GETDATE(), AppointmentDate))) AS MinDistance
FROM Table
GROUP BY Name
) d ON t.Name = d.Name AND t.Distance = d.MinDistance

Resources