This question already has answers here:
Trying to find vehicles which are free between 2 variable dates
(2 answers)
Closed 8 years ago.
So I am kind of stuck on this Query I need to do, I have no Idea what It even means.
I need to be able to find a vehicle number from a table entitled Vehicle_Details and check whether it is currently in use for the period Of time I would like to use it as said bellow.
When a new trip is being arranged, the administrator has to find a vehicle that is not already in use for the trip duration. Because this query will be needed frequently, it is important that it can be run easily for an arbitrary start and end date. It should therefore use substitution variables so that the trip start and end dates can be provided at run time.
Make sure that when it is run the user is only prompted to supply the start and end dates once. Also make sure that any vehicles displayed are available for the entire period specified - you will have to include more than one test in the where clause
Any code would be helpful but even links to things that can help me write it myself as I have No idea tbh.
Example data from the three tables:
Trip_ID Departure Return_Date Duration Registration
73180 07-FEB-12 08-FEB-12 1 PY09 XRH
73181 07-FEB-12 08-FEB-12 1 PY10 OPM
73182 07-FEB-12 10-FEB-12 3 PY56 BZT
73183 07-FEB-12 08-FEB-12 1 PY56 BZU
73184 07-FEB-12 09-FEB-12 2 PY58 UHF
Registration Make Model Year
4585 AW ALBION RIEVER 1963
SDU 567M ATKINSON N/A 1974
P525 CAO DAF FT85.400 1996
PY55 CGO DAF FTGCF85.430 2005
PY06 BYP DAF FTGCF85.430 2006
Weight Registration Body Vehicle ID
20321 4585 AW N/A 1
32520 SDU 567M N/A 2
40000 P525 CAO N/A 3
40000 PY55 CGO N/A 4
40000 PY06 BYP N/A 5
You need to find if a particular date range intersects with any reservation. This is interval intersection arithmetic. Consider the following intervals [A,B] and [x,y]:
-----------[xxxxxxxxxxx]-------------
A B
---------------------[xxxxxx]--------
x y
An interval [x,y] will intersect with [A,B] if and only if:
B >= x
And A <= y
So your query will look like:
SELECT *
FROM registrations reg
WHERE reg.registration = :searched_vehicle
AND NOT EXISTS (SELECT NULL
FROM reservations res
WHERE res.registration = reg.registration
AND res.return_date >= :interval_start
AND res.departure <= :interval_end)
This is for one vehicle. If the query returns a row, this vehicle is available for the given interval [:interval_start, :interval_end].
Related
I'm working on an MRP simulation in which I have to subtract demand or add supply qty to available stock and I hope you can be of support. Find below the result I want to achieve.
I have 1 value for stock = 22 and a lot of values for future demand/supply on specific dates.
Part
Stock
Demand/Supply qty
Demand/Supply Date
Result
1000680
22
-1
2023-01-01
21
1000680
21* what I want to achieve
-15
2023-01-02
6* expected outcome
1000680
6* what I want to achieve
+10
2023-01-03
16* expected outcome
I'm still on the SQL learning curve. I started to add rownumbers to the lines to make sure that the sequence is correct:
select
part,
rownum = ROW_NUMBER() OVER (ORDER BY part, mrp_due_date),
current_stock_qty,
demand_supply_qty,
current_stock - qty as new_stock_qty, -- if demand
current_stock + qty as new_stock_qty, -- if supply
mrp_due_date
from #base
Then I tried the lag function to derive previous row 'new_stock_qty' at date but this only worked for the first line (see image:
)
So I probably need the loop function to first calculate stock-demand and use the result as new stock.
I have looked through similar questions asked on this site, but I find it difficult to define my solution based on that information.
I'm trying to figure out the number of working days between two dates. The table (dfDates) is laid out as follows:
Key
StartDateKey
EndDateKey
1
20171227
20180104
2
20171227
20171229
I have another table (dfDimDate) with all the relevant date keys and whether the date key is a working day or not:
DateKey
WorkDayFlag
20171227
1
20171228
1
20171229
1
20171230
0
20171231
0
20180101
0
20180102
1
20180103
1
20180104
1
I'm expecting a result as so:
Key
WorkingDays
1
6
2
3
So far (I realise this isn't complete to get me the above result), I've written this:
workingdays = []
for i in range(0, len(dfDates)):
value = dfDimDate.filter((dfDimDate.DateKey >= dfDates.collect()[i][1]) & (dfDimDate.DateKey <= df.collect()[i][2])).agg({'WorkDayFlag': 'sum'})
workingdays.append(value.collect())
However, only null values are being returned. Also, I've noticed this is very slow and took 54 seconds before it errored.
I think I understand what the error is about but I'm not sure how to fix it. Also, I'm not sure how to optimise the command so it runs faster. I'm looking for a solution in pyspark or spark SQL (whichever is easiest).
Many thanks,
Carolina
Edit: The error below was resolved thanks to a suggestion from #samkart who said to put the agg after the filter
AnalysisException: Resolved attribute(s) DateKey#17075 missing from sum(WorkDayFlag)#22142L in operator !Filter ((DateKey#17075 <= 20171228) AND (DateKey#17075 >= 20171227)).;
A possible and simple solution:
from pyspark.sql import functions as F
dfDates \
.join(dfDimDate, dfDimDate.DateKey.between(dfDates.StartDateKey, dfDates.EndDateKey)) \
.groupBy(dfDates.Key) \
.agg(F.sum(dfDimDate.WorkDayFlag).alias('WorkingDays'))
That is, first join the two datasets in order to link each date with all the dimDate rows in its range (dfDates.StartDateKey <= dfDimDate.DateKey <= dfDates.EndDateKey).
Then simply group the joined dataset by the date key and count the number of working days in its range.
In the solution you proposed, you are performing the calculation directly on the driver, so you are not taking advantage of the parallelism that spark offers. This should be avoided when possible, especially for large datasets.
Apart from that, you are requesting repeated collects in the for-loop, even for the same data, resulting in a further slowdown.
I'm sadly out of ideas. I'm currently learning in COGNOS analytics and I could use your help.
I have crosstable that looks like this and comes from different system that uses the same source structure. I use company account and am a user, so I cannnot sadly write SQL or any scripts!
MIS0 MIS1 MIS3 MIS6
2016 0,0 0,1 0,3 0,6
2017 0,0 0,1 0,4 0,7
2018 0,0 0,2 0,4 0,7
I replicated this in COGNOS but cannot get one thing right (it's much more difficult than than but I think that this is the core)
explanation:
MIS = months in service
years = year of product manufactury
values = (faults / manufactured (that year) and sold products) * 1000
Fault has property MIS = which MIS it happened in, also product has property something like dateOfManufacture
ok so the problem... to have e.g. MIS6 means: Fault that happened within 6 months since purchase. The complication starts that MIS3 fault logically belongs to MIS6 fault too.
So I need to create data-element or filter or some other trick that would enable me to:
select faults relevant for MIS from 0 to X where X will be the number in the header for columns (0,1,3,6...) based of course on year of manufacture .. I'm limited by my user rights so please if you have a suggestion that contains writing a script, thank you, you roll! :) but I won't be able to do it via script.
Excuse the lack of details but named variables or any code is a part of the confidetiality I'm bound by. :(
Thank you for the time and have a nice weekend!
Fault
MIS: 2
ProductID: <121212>
Product
ProductID: <121212>
Date of assembly: 25.02.2020
(MIS: gets copied to product fault when fault occours)
Table is supposed to view faults that have happened in specific months in service - that means that if fault is as above example says in 2 months in service, it should be calculated into columns MIS3 and MIS6 and not calculated into MIS1 and MIS0 statistics since the fault didn't occour in 1 months but in 2.
Basically e.g. the first row second column says: find me products that have been manufactured in 2016 - count how many faults they had in first month in service. This number divide by the number of products you found (first sentence) and all this multiply by 1000 (faults/1000)
As you can now probably see the problem occours when you move to next column on the same row. -> find me products that have been manufactured in 2016. Count how many fault they had in 3 months of service (= 1,2,3 included) and then divide by the number of products made - multiply by 1000.
When I set up crosstab I need to use inteval (MIS0 - MIS1,3,6) with floating maximum, but I don't have the brain to make it..
Try with a list first. If this works, we can convert the list to a crosstab
Let's start by isolating the metric in context to time
This would be your first column
For one month. Create a data item [Month 1 Faults] like this:
if ([Year] = 2016 and [Month] = 1)Then([Faults])Else(0)
Next column is for both month 1 and 2. We add the function IN(1,2) to accomplish this
Create a data item [Month 1 & 2 Faults] like this:
if ([Year] = 2016 and [Month] IN(1,2))Then([Faults])Else(0)
repeat this logic for all of the other data items
In this example, there are 5 periods of actual balances and the implied depreciation rates. Starting in Period 6, need the Balance to be calculated on previous period balance ($8,177,480) * the current period depreciation rate (-1.50%) and so on. I've heard recursive CTE but I am not familiar with them.
Period DeprRate Balance Comment
1 0% $10,000,000 Actual Values
2 -1.62% $9,838,000 Actual Values
3 -7.41% $9,109,004 Actual Values
4 -8.00% $8,380,284 Actual Values
5 -2.42% $8,177,481 Actual Values
6 -1.50% null should be $8,177,481*(1-.015)
7 -1.50% null should be Pd 6 Calc Balance *(1-.015)
8 -5.73% null should be Pd 7 Calc Balance *(1-.0573)
9 -4.13% null should be Pd 8 Calc Balance *(1-.0413)
10 -1.50% null should be Pd 9 Calc Balance *(1-.015)
CREATE TABLE Table1
([Period] int, [DeprRate] float, Balance integer)
;
INSERT INTO Table1
([Period], [DeprRate], Balance)
VALUES
(1,0,10000000),
(2,-0.0162,9838000),
(3,-0.0741,9109004.2),
(4,-0.08,8380283.864),
(5,-0.0242,8177480.9944912),
(6,-0.015,null),
(7,-0.015,null),
(8,-0.0573,null),
(9,-0.0413,null),
(10,-0.015,null)
"This seems relatively easy, but can't get it."
Yes, it is. Did you follow these steps ?
"I have 10 periods of actual balances and the implied depreciation rates."
Step 1 : Create a table (Table_1) and populate it with these values.
" Starting in Period 11, need the Balance to be calculated on previous period balance * the current period depreciation rate."
Step 2 : Create a query for calculation of new rates based on the values of previous table, execute it and populate it to new table (Table_2).
" Period 11 isn't difficult if that's all that was needed by using lag. Problem is Period 12-20 need to be calculating current period balance on previous period calculated balance multiplied by the current period depreciation rate."
Step 3 : Two options here - one is through a recursive query as 'Vinit' commented. Another option (easy) is to repeat Step 2 and append to Table_2.
=======
Knowledge sharing / Value addition to your question : Depreciation is an Accounting concept, which usually taken into account either in the year end (closing of the books) or at the end of life of an asset. This concept is very tricky as at least two (usually) different calculations may have to be performed to satisfy the tax compliance and also management accounting requirements. Additional calculations may also have to be carried out for each type of asset, just to take decision to determine best possible option.
Though you did not include the date column in your sample data, you should be writing the script to calculate and populate the depreciated values based on a particular date. You can also arrange to execute this script by specifying a trigger as well as through a job agent (scheduling).
Hope this helps.
I want something strange here. I've table names as EMP_INFO which contains few details of an employee (i.e. Name,Designation, JOIN_FROM, JOIN_TO). I am trying to figure out term for each employee on yearly basis. I've below types of data
EMP_ID EMP_DESIG JOIN_FROM JOIN_TO Query Result
1 Supervisor 01-05-11 30-04-13 Should Display
2 Supervisor 15-06-10 31-12-12 Should Display
3 Jobar 01-01-12 31-12-13 Should Display
4 SR Superior 01-12-11 31-12-15 Should Display
5 Supervisor 01-05-11 31-12-13 Should Display
6 Supervisor 01-05-11 31-12-13 Should Display
7 Supervisor 01-05-11 31-12-13 Should Display
8 Supervisor 01-02-12 15-06-13 Should Display
9 SR Superior 16-03-10 18-11-11 Should Display
10 SR Superior 16-06-05 18-11-11 Should Display
11 Jobar 30-11-11 31-12-13 Should Display
12 Superior 02-02-05 31-12-20 Should Display
13 Jobar 30-11-11 31-12-13 Should Display
14 Jobar 30-11-09 31-12-10 Should Not Display
Basically what i need is I have date range in my report and let's say From: "01-Jun-11" To "31-Dec-13". From above record set report should retrieve all records as all records contains this both dates.
I have tried by using BETWEEN syntax but i believe it will not work.
If anyone can help me in this than it would be appreciated.
Thanks in Advance.. And one more thing if this details is not enough to understand than let me know i will add more in details.
Modified
Query which I tried
SELECT EI.*
FROM EMP_INFO EI,
(SELECT
TO_DATE('01-JUN-2011','DD-MON-YYYY') A,
TO_DATE('31-DEC-2013','DD-MON-YYYY') B FROM DUAL) X
WHERE
(EI.JOIN_FROM IS NOT NULL AND EI.JOIN_TO IS NOT NULL)
AND (
X.A BETWEEN EI.JOIN_FROM AND EI.JOIN_TO
AND X.B BETWEEN EI.JOIN_FROM AND EI.JOIN_TO
OR (EI.JOIN_FROM >= X.B AND EI.JOIN_TO <=X.A) )
Modified Added column (Query Result) on above table which contains result for each record.
So you simply want all records where the join time is in the given time range? That would be:
SELECT *
FROM EMP_INFO
WHERE JOIN_FROM BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
AND JOIN_TO BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN');
EDIT: Sorry, I got it now. You are looking for all time ranges that overlap with the given range. That would be: ranges that start before and end within, ranges that start before and end after, ranges that start within and end within and ranges that start within and end after. Another way to express this is: Either the given time range start is within the other time range or the other time range start is within the given time range. Here is the according statement:
SELECT *
FROM EMP_INFO
WHERE JOIN_FROM BETWEEN
TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN') AND
TO_DATE('31-DEC-2013','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
OR TO_DATE('01-JUN-2011','DD-MON-YYYY', 'NLS_DATE_LANGUAGE=AMERICAN')
BETWEEN JOIN_FROM AND JOIN_TO;
And here is the SQL fiddle: http://sqlfiddle.com/#!4/b58b3/3
Convert to same format and compare. There may be a time component in the dates stored in database. Previous answer was wrong.