I'm working on an MRP simulation in which I have to subtract demand or add supply qty to available stock and I hope you can be of support. Find below the result I want to achieve.
I have 1 value for stock = 22 and a lot of values for future demand/supply on specific dates.
Part
Stock
Demand/Supply qty
Demand/Supply Date
Result
1000680
22
-1
2023-01-01
21
1000680
21* what I want to achieve
-15
2023-01-02
6* expected outcome
1000680
6* what I want to achieve
+10
2023-01-03
16* expected outcome
I'm still on the SQL learning curve. I started to add rownumbers to the lines to make sure that the sequence is correct:
select
part,
rownum = ROW_NUMBER() OVER (ORDER BY part, mrp_due_date),
current_stock_qty,
demand_supply_qty,
current_stock - qty as new_stock_qty, -- if demand
current_stock + qty as new_stock_qty, -- if supply
mrp_due_date
from #base
Then I tried the lag function to derive previous row 'new_stock_qty' at date but this only worked for the first line (see image:
)
So I probably need the loop function to first calculate stock-demand and use the result as new stock.
I have looked through similar questions asked on this site, but I find it difficult to define my solution based on that information.
Related
I'm trying to figure out the number of working days between two dates. The table (dfDates) is laid out as follows:
Key
StartDateKey
EndDateKey
1
20171227
20180104
2
20171227
20171229
I have another table (dfDimDate) with all the relevant date keys and whether the date key is a working day or not:
DateKey
WorkDayFlag
20171227
1
20171228
1
20171229
1
20171230
0
20171231
0
20180101
0
20180102
1
20180103
1
20180104
1
I'm expecting a result as so:
Key
WorkingDays
1
6
2
3
So far (I realise this isn't complete to get me the above result), I've written this:
workingdays = []
for i in range(0, len(dfDates)):
value = dfDimDate.filter((dfDimDate.DateKey >= dfDates.collect()[i][1]) & (dfDimDate.DateKey <= df.collect()[i][2])).agg({'WorkDayFlag': 'sum'})
workingdays.append(value.collect())
However, only null values are being returned. Also, I've noticed this is very slow and took 54 seconds before it errored.
I think I understand what the error is about but I'm not sure how to fix it. Also, I'm not sure how to optimise the command so it runs faster. I'm looking for a solution in pyspark or spark SQL (whichever is easiest).
Many thanks,
Carolina
Edit: The error below was resolved thanks to a suggestion from #samkart who said to put the agg after the filter
AnalysisException: Resolved attribute(s) DateKey#17075 missing from sum(WorkDayFlag)#22142L in operator !Filter ((DateKey#17075 <= 20171228) AND (DateKey#17075 >= 20171227)).;
A possible and simple solution:
from pyspark.sql import functions as F
dfDates \
.join(dfDimDate, dfDimDate.DateKey.between(dfDates.StartDateKey, dfDates.EndDateKey)) \
.groupBy(dfDates.Key) \
.agg(F.sum(dfDimDate.WorkDayFlag).alias('WorkingDays'))
That is, first join the two datasets in order to link each date with all the dimDate rows in its range (dfDates.StartDateKey <= dfDimDate.DateKey <= dfDates.EndDateKey).
Then simply group the joined dataset by the date key and count the number of working days in its range.
In the solution you proposed, you are performing the calculation directly on the driver, so you are not taking advantage of the parallelism that spark offers. This should be avoided when possible, especially for large datasets.
Apart from that, you are requesting repeated collects in the for-loop, even for the same data, resulting in a further slowdown.
In this example, there are 5 periods of actual balances and the implied depreciation rates. Starting in Period 6, need the Balance to be calculated on previous period balance ($8,177,480) * the current period depreciation rate (-1.50%) and so on. I've heard recursive CTE but I am not familiar with them.
Period DeprRate Balance Comment
1 0% $10,000,000 Actual Values
2 -1.62% $9,838,000 Actual Values
3 -7.41% $9,109,004 Actual Values
4 -8.00% $8,380,284 Actual Values
5 -2.42% $8,177,481 Actual Values
6 -1.50% null should be $8,177,481*(1-.015)
7 -1.50% null should be Pd 6 Calc Balance *(1-.015)
8 -5.73% null should be Pd 7 Calc Balance *(1-.0573)
9 -4.13% null should be Pd 8 Calc Balance *(1-.0413)
10 -1.50% null should be Pd 9 Calc Balance *(1-.015)
CREATE TABLE Table1
([Period] int, [DeprRate] float, Balance integer)
;
INSERT INTO Table1
([Period], [DeprRate], Balance)
VALUES
(1,0,10000000),
(2,-0.0162,9838000),
(3,-0.0741,9109004.2),
(4,-0.08,8380283.864),
(5,-0.0242,8177480.9944912),
(6,-0.015,null),
(7,-0.015,null),
(8,-0.0573,null),
(9,-0.0413,null),
(10,-0.015,null)
"This seems relatively easy, but can't get it."
Yes, it is. Did you follow these steps ?
"I have 10 periods of actual balances and the implied depreciation rates."
Step 1 : Create a table (Table_1) and populate it with these values.
" Starting in Period 11, need the Balance to be calculated on previous period balance * the current period depreciation rate."
Step 2 : Create a query for calculation of new rates based on the values of previous table, execute it and populate it to new table (Table_2).
" Period 11 isn't difficult if that's all that was needed by using lag. Problem is Period 12-20 need to be calculating current period balance on previous period calculated balance multiplied by the current period depreciation rate."
Step 3 : Two options here - one is through a recursive query as 'Vinit' commented. Another option (easy) is to repeat Step 2 and append to Table_2.
=======
Knowledge sharing / Value addition to your question : Depreciation is an Accounting concept, which usually taken into account either in the year end (closing of the books) or at the end of life of an asset. This concept is very tricky as at least two (usually) different calculations may have to be performed to satisfy the tax compliance and also management accounting requirements. Additional calculations may also have to be carried out for each type of asset, just to take decision to determine best possible option.
Though you did not include the date column in your sample data, you should be writing the script to calculate and populate the depreciated values based on a particular date. You can also arrange to execute this script by specifying a trigger as well as through a job agent (scheduling).
Hope this helps.
I have a table that has an auto-incrementing identity "Reference" field and a pair of other fields that determine the sort order. What I need to do is find the 'next' item in the table when sorted based on the pair of fields based on the reference field of an initial item.
So my data looks like this when sorted by SortParent.SortChild:
Reference SortParent SortChild Data
------------------------------------------
9 1 2 Fred
7 1 3 Jim
11 1 4 Sheila
4 2 1 Micro
5 2 2 Archimedes
12 2 3 Electron
So in this example the "Jim" row (Reference=7) comes after "Fred" (Reference=9) even though it's reference is smaller.
So i want to be able to find which row comes after Fred by searching based on Jim's reference
At the moment in code I do a query to find the values for Fred's row:
SELECT SortParent,SortChild From MyTable WHERE Reference=9
Which returns 1,2. Then do a search for the first row that comes after 1,2:
SELECT * FROM MyTable
WHERE ((SortParent=1 and SortChild>2) OR (SortParent>2))
ORDER BY SortParent,SortChild
Which will therefore come back with the row having reference 7 and sort values 1,3
I'm pretty sure this can be done in a single query, but i'm stumped on the best way.
Incidentally, if anyone has any suggestions on alternate way of handling the two part sort columns that would make this easier, please feel free to help!
I believe You are looking at the LEAD or LAG windowed function:
https://msdn.microsoft.com/en-US/library/hh213125.aspx
SELECT
NextReference
FROM
(SELECT
reference
, LEAD(reference, 1,0) OVER (ORDER BY SortParent,SortChild) AS NextReference
, *
FROM
mytable
) newTable
WHERE
reference = 9
I used LEAD, but try it with LAG if you are looking in the other direction for the row
I havn't tested this particular query, so my not be syntactically sound, but let me know if you have any troubles with it and I'll go over it a bit more once I'm back at my desk
EDIT: Used the wrong sql from your question as my base
EDIT2: Put the lead into a subquery to allow us to query on it
I want something strange here. I've table names as EMP_INFO which contains few details of an employee (i.e. Name,Designation, JOIN_FROM, JOIN_TO). I am trying to figure out term for each employee on yearly basis. I've below types of data
EMP_ID EMP_DESIG JOIN_FROM JOIN_TO Query Result
1 Supervisor 01-05-11 30-04-13 Should Display
2 Supervisor 15-06-10 31-12-12 Should Display
3 Jobar 01-01-12 31-12-13 Should Display
4 SR Superior 01-12-11 31-12-15 Should Display
5 Supervisor 01-05-11 31-12-13 Should Display
6 Supervisor 01-05-11 31-12-13 Should Display
7 Supervisor 01-05-11 31-12-13 Should Display
8 Supervisor 01-02-12 15-06-13 Should Display
9 SR Superior 16-03-10 18-11-11 Should Display
10 SR Superior 16-06-05 18-11-11 Should Display
11 Jobar 30-11-11 31-12-13 Should Display
12 Superior 02-02-05 31-12-20 Should Display
13 Jobar 30-11-11 31-12-13 Should Display
14 Jobar 30-11-09 31-12-10 Should Not Display
Basically what i need is I have date range in my report and let's say From: "01-Jun-11" To "31-Dec-13". From above record set report should retrieve all records as all records contains this both dates.
I have tried by using BETWEEN syntax but i believe it will not work.
If anyone can help me in this than it would be appreciated.
Thanks in Advance.. And one more thing if this details is not enough to understand than let me know i will add more in details.
Modified
Query which I tried
SELECT EI.*
FROM EMP_INFO EI,
(SELECT
TO_DATE('01-JUN-2011','DD-MON-YYYY') A,
TO_DATE('31-DEC-2013','DD-MON-YYYY') B FROM DUAL) X
WHERE
(EI.JOIN_FROM IS NOT NULL AND EI.JOIN_TO IS NOT NULL)
AND (
X.A BETWEEN EI.JOIN_FROM AND EI.JOIN_TO
AND X.B BETWEEN EI.JOIN_FROM AND EI.JOIN_TO
OR (EI.JOIN_FROM >= X.B AND EI.JOIN_TO <=X.A) )
Modified Added column (Query Result) on above table which contains result for each record.
So you simply want all records where the join time is in the given time range? That would be:
SELECT *
FROM EMP_INFO
WHERE JOIN_FROM BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
AND JOIN_TO BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN');
EDIT: Sorry, I got it now. You are looking for all time ranges that overlap with the given range. That would be: ranges that start before and end within, ranges that start before and end after, ranges that start within and end within and ranges that start within and end after. Another way to express this is: Either the given time range start is within the other time range or the other time range start is within the given time range. Here is the according statement:
SELECT *
FROM EMP_INFO
WHERE JOIN_FROM BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
OR TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
BETWEEN JOIN_FROM AND JOIN_TO;
And here is the SQL fiddle: http://sqlfiddle.com/#!4/b58b3/3
Convert to same format and compare. There may be a time component in the dates stored in database. Previous answer was wrong.
The problem that I have is SQL Server Reporting Services does not like Sum(First()) notation. It will only allow either Sum() or First().
The Context
I am creating a reconciliation report. ie. what sock we had a the start of a period, what was ordered and what stock we had at the end.
Dataset returns something like
Type,Product,Customer,Stock at Start(SAS), Ordered Qty, Stock At End (SAE)
Export,1,1,100,5,90
Export,1,2,100,5,90
Domestic,2,1,200,10,150
Domestic,2,2,200,20,150
Domestic,2,3,200,30,150
I group by Type, then Product and list the customers that bought that product.
I want to display the total for SAS, Ordered Qty, and SAE but if I do a Sum on the SAS or SAE I get a value of 200 and 600 for Product 1 and 2 respectively when it should have been 100 and 200 respectively.
I thought that i could do a Sum(First()) But SSRS complains that I can not have an aggregate within an aggregate.
Ideally SSRS needs a Sum(Distinct())
Solutions So Far
1. Don't show the Stock at Start and Stock At End as part of the totals.
2. Write some code directly in the report to do the calc. tried this one - didn't work as I expected.
3. Write an assembly to do the calculation. (Have not tried this one)
Edit - Problem clarification
The problem stems from the fact that this is actually two reports merged into one (as I see it). A Production Report and a sales report.
The report tried to address these criteria
the market that we sold it to (export, domestic)
how much did we have in stock,
how much was produced,
how much was sold,
who did we sell it to,
how much do we have left over.
The complicating factor is the who did we sell it to. with out that, it would have been relativly easy. But including it means that the other top line figures (stock at start and stock at end) have nothing to do with the what is sold, other than the particular product.
I had a similar issue and ended up using ROW_NUMBER in my query to provide a integer for the row value and then using SUM(IIF(myRowNumber = 1, myValue, 0)).
I'll edit this when I get to work and provide more data, but thought this might be enough to get you started. I'm curious about Adolf's solution too.
Pooh! Where's my peg?!
Have you thought about using windowing/ranking functions in the SQL for this?
This allows you to aggregate data without losing detail
e.g. Imagine for a range of values, you want the Min and Max returning, but you also wish to return the initial data (no summary of data).
Group Value Min Max
A 3 2 9
A 7 2 9
A 9 2 9
A 2 2 9
B 5 5 7
B 7 5 7
C etc..
Syntax looks odd but its just
AggregateFunctionYouWant OVER (WhatYouWantItGroupedBy, WhatYouWantItOrderedBy) as AggVal
Windowing
Ranking
you're dataset is a little weird but i think i understand where you're going.
try making the dataset return in this order:
Type, Product, SAS, SAE, Customer, Ordered Qty
what i would do is create a report with a table control. i would set up the type, product, and customer as three separate groups. i would put the sas and sae data on the same group as the product, and the quantity on the customer group. this should resemble what i believe you are trying to go for. your sas and sae should be in a first()
Write a subquery.
Ideally SSRS needs a Sum(Distinct())
Re-write your query to do this correctly.
I suspect your problem is that you're written a query that gets you the wrong results, or you have poorly designed tables. Without knowing more about what you're trying to do, I can't tell you how to fix it, but it has a bad "smell".