Calculate MTTF with SQL query - sql-server

I'm trying to calculate the expected average time between the end of a fault and the beginning of the next one (two separate columns) [MTTF].
MTTF = Mean Time To Failure:
is the average time from the end of a fault to the beginning of the next
I have already asked a similar question and I have been answered very professionally. I love this forum.
I have to calculate the difference between the dates in the Failure column and in the End_Of_Repair column for each row -1 (another column), transforming it into hours and dividing it by the number of intervals to calculate the average time.
Intervals = number of rows - 1
This is my table:
Failure |Start_Repair |End_Of_Repair |Line |Operator
------------------------|------------------------|------------------------|---------|--------
2019-06-26 06:30:00 |2019-06-26 10:40:00 |2019-06-27 12:00:00 |A |Mike
2019-06-28 00:10:00 |2019-06-28 02:40:00 |2019-06-29 01:12:00 |A |Loty
2019-06-30 10:10:00 |2019-06-30 02:40:00 |2019-07-01 00:37:00 |B |Judy
2019-07-02 12:01:00 |2019-07-02 14:24:00 |2019-07-05 00:35:00 |B |Judy
2019-07-06 07:08:00 |2019-07-06 15:46:00 |2019-07-07 02:30:00 |A |Mike
2019-07-07 08:22:00 |2019-07-08 05:19:00 |2019-07-08 08:30:00 |B |Loty
2019-07-29 04:10:00 |2019-07-29 07:40:00 |2019-07-29 14:00:05 |A |Judy
So I have to make the difference on the error and end of error columns, the second minus the first, the third minus the second, etc. Divided by the calculated intervals (which are the number of lines-1 since I start from line 2-line 1).
In a nutshell, an average-cross between two tables.
I attach the image to make the idea better.
So I'll have to do the seventh line of the failure column minus the sixth row of the end_of_repair column, sixth row of failure minus the fifth row of end_of_repair column and so on until you get to the first.
I tought:
SELECT line,
DATEDIFF(hour, min(End_Of_Repair), max (Failure)) / nullif(count(*) - 1, 0) as 'intervals'
from Test_Failure
group by line
But the results are:
A = 253
B = 76
The result should come back
A = (12,1+173,93+529,6) / 3 = 238h
B = (35,4 + 55,78) / 2 = 45,59h

One method would be to use LAG to get the value from the previous row; then you can average the difference in hours between the 2 times:
WITH CTE AS(
SELECT V.Failure,
V.Start_Repair,
V.End_Of_Repair,
V.Line,
V.Operator,
LAG(V.End_Of_Repair) OVER (PARTITION BY V.Line ORDER BY V.Failure) AS LastRepair
FROM (VALUES('2019-06-26T06:30:00','2019-06-26T10:40:00','2019-06-27T12:00:00','A ','Mike'),
('2019-06-28T00:10:00','2019-06-28T02:40:00','2019-06-29T01:12:00','A ','Loty'),
('2019-06-30T10:10:00','2019-06-30T02:40:00','2019-07-01T00:37:00','B ','Judy'),
('2019-07-02T12:01:00','2019-07-02T14:24:00','2019-07-05T00:35:00','B ','Judy'),
('2019-07-06T07:08:00','2019-07-06T15:46:00','2019-07-07T02:30:00','A ','Mike'),
('2019-07-07T08:22:00','2019-07-08T05:19:00','2019-07-08T08:30:00','B ','Loty'),
('2019-07-29T04:10:00','2019-07-29T07:40:00','2019-07-29T14:00:05','A ','Judy'))V(Failure, Start_Repair, End_Of_Repair, Line, Operator))
SELECT CTE.Line,
AVG(DATEDIFF(HOUR, CTE.LastRepair, CTE.Failure)) AS FaultHours
FROM CTE
GROUP BY CTE.Line;

Related

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

get dates in range of provided start and end dates

I have records in database with start and end dates
i want to filter out all records with are in range of start and end date
i have two query that works , gives me records between two dates and the other query gives me records that are in range of start or end date
how do i combine these two linq query into single one so it works in both ways
Linq 1
schedulelist = (From c In db.Table1 Where c.StartDate.Value.Date <= objStartDate.Date And c.EndDate.Value.Date >= objStartDate.Date And
c.UserID = CInt(Session("UserID"))).ToList()
Linq 2
schedulelist = (From c In db.Table1 Where (c.StartDate.Value.Date >=
objStartDate.Date And c.StartDate.Value.Date <= objEndDate.Date) Or
(c.EndDate.Value.Date >= objStartDate.Date And c.EndDate.Value.Date <=
objEndDate.Date) And c.UserID = CInt(Session("UserID"))).ToList()
in db table have these values
StartDate EndDate
2019-10-08 07:00:00.000 2019-10-30 07:00:00.000
2019-10-15 07:00:00.000 2019-10-27 07:00:00.000
if search with ObjStartDate 15/10/2019 00:00:00 and ObjEndDate 27/10/2019 00:00:00
i get record No 2 when i run Linq 2
i get Record No 1 when i run Linq 1
what i should get is both records for any Linq 1 or Linq 2
so whats the better solution to combine both into one query or this query is all wrong ?
The simplest query to check if date range 1 intersects date range 2 is this:
... WHERE range2_end > range1_start
AND range1_end > range2_start
This checks all cases such as range 1 fully inside, fully contains, starts inside or ends inside range 2.

Fill secondly data from Q KDB+

I have a csv file with some high frequency stock price data, and I'd like to get a secondly price data from the table.
In each file, there are columns named date, time, symbol, price, volume, and etc.
There are some seconds with no trading so there are missing data in some seconds.
I'm wondering how could I fill the missing data in Q to get the secondly data from 9:30 to 16:00 in full? If there is missing price, just use the recently price as its price in that second.
I'm considering to write some loop, but I don't know how to exactly to that.
Simplifying a little, I'll assume you have some random timestamps in your dataset like this:
time price
--------------------------------------
2015.01.20D22:42:34.776607000 7
2015.01.20D22:42:34.886607000 3
2015.01.20D22:42:36.776607000 4
2015.01.20D22:42:37.776607000 8
2015.01.20D22:42:37.886607000 7
2015.01.20D22:42:39.776607000 9
2015.01.20D22:42:40.776607000 4
2015.01.20D22:42:41.776607000 9
so there are some missing seconds there. I'm going to call this table t. So if you do a by-second type of query, obviously the seconds that are missing are still missing:
q)select max price by time.second from t
second | price
--------| -----
22:42:34| 7
22:42:36| 4
22:42:37| 8
22:42:39| 9
22:42:40| 4
22:42:41| 9
To get missing seconds, you have to join a list of nulls. In this case we know the data goes from 22:42:34 to 22:42:41, but in reality you'll have to find the min/max time and use that to create a temporary "null" table to join against:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N)
second price
--------------
22:42:34
22:42:35
22:42:36
22:42:37
22:42:38
22:42:39
22:42:40
22:42:41
Then left join:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35
22:42:36 4
22:42:37 8
22:42:38
22:42:39 9
22:42:40 4
22:42:41 9
You can use fills or whatever your favourite filling heuristic is after that.
q)fills `second xasc asc ([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35 7
22:42:36 4
22:42:37 8
22:42:38 8
22:42:39 9
22:42:40 4
22:42:41 9
(Note the sort on second before fills!)
By the way for larger tables this will be much faster than a loop. Loops in q are generally a bad idea.
EDIT
You could use a comma join too, both tables need to be keyed on the second column
t,t1
(where t1 is the null-filled table keyed on second)
I haven't tested it, but I suspect it would be slightly faster than the lj version.
Using aj which is one of the most powerful features of KDB:
q)data
sym time price size
----------------------------
MS 10:24:04 93.35974 8
MS 10:10:47 4.586986 1
APPL 10:50:23 0.7831685 1
GOOG 10:19:52 49.17305 0
in-memory table needs to be sym,time sorted with g# attribute applied to sym column
q)data:update `g#sym from `sym`time xasc data
q)meta trade
c | t f a
-----| -----
sym | s g
time | v
price| f
size | j
Creating a rack table intervalized per second per sym :
q)rack: `sym`time xasc (select distinct sym from data) cross ([] time:{x[0]+til `int$x[1]-x[0]}(min;max)#\:data`time)
Using aj to join the data :
q)aj[`sym`time; rack; data]

In SSRS, how can I add a row to aggregate all the rows that don't match a filter?

I'm working on a report that shows transactions grouped by type.
Type Total income
------- --------------
A 575
B 244
C 128
D 45
E 5
F 3
Total 1000
I only want to provide details for transaction types that represent more than 10% of the total income (i.e. A-C). I'm able to do this by applying a filter to the group:
Type Total income
------- --------------
A 575
B 244
C 128
Total 1000
What I want to display is a single row just above the total row that has a total for all the types that have been filtered out (i.e. the sum of D-F):
Type Total income
------- --------------
A 575
B 244
C 128
Other 53
Total 1000
Is this even possible? I've tried using running totals and conditionally hidden rows within the group. I've tried Iif inside Sum. Nothing quite seems to do what I need and I'm butting up against scope issues (e.g. "the value expression has a nested aggregate that specifies a dataset scope").
If anyone can give me any pointers, I'd be really grateful.
EDIT: Should have specified, but at present the dataset actually returns individual transactions:
ID Type Amount
---- ------ --------
1 A 4
2 A 2
3 B 6
4 A 5
5 B 5
The grouping is done using a row group in the tablix.
One solution is to solve that in the SQL source of your dataset instead of inside SSRS:
SELECT
CASE
WHEN CAST([Total income] AS FLOAT) / SUM([Total income]) OVER (PARTITION BY 1) >= 0.10 THEN [Type]
ELSE 'Other'
END AS [Type]
, [Total income]
FROM Source_Table
See also SQL Fiddle
Try to solve this in SQL, see SQL Fiddle.
SELECT I.*
,(
CASE
WHEN I.TotalIncome >= (SELECT Sum(I2.TotalIncome) / 10 FROM Income I2) THEN 10
ELSE 1
END
) AS TotalIncomePercent
FROM Income I
After this, create two sum groups.
SUM(TotalIncome * TotalIncomePercent) / 10
SUM(TotalIncome * TotalIncomePercent)
Second approach may be to use calculated column in SSRS. Try to create a calculated column with above case expression. If it allows you to create it, you may use it in the same way as SQL approach.
1) To show income greater than 10% use row visibility condition like
=iif(reportitems!total_income.value/10<= I.totalincome,true,false)
here reportitems!total_income.value is total of all income textbox value which will be total value of detail group.
and I.totalincome is current field value.
2)add one more row to outside of detail group to achieve other income and use expression as
= reportitems!total_income.value-sum(iif(reportitems!total_income.value/10<= I.totalincome,I.totalincome,nothing))

Merging Data to Run Specific Individual Analysis

I have two data sets. FIRST is a list of products and their daily prices from a supplier and SECOND is a list of start and end dates (as well as other important data for analysis). How can I tell Stata to pull the price at the beginning date and then the price at the end date from FIRST into SECOND for the given dates. Please note, if there is no exact matching date I would like it to grab the last date available. For example, if SECOND has the date 1/1/2013 and FIRST has prices on ... 12/30/2012, 12/31/2012, 1/2/2013, ... it would grab the 12/31/2012 price.
I would usually do this with Excel, but I have millions of observations, and it is not feasible.
I have put an example of FIRST and SECOND as well as what the optimal solution would give as an output POST_SECOND
FIRST
Product Price Date
1 3 1/1/2010
1 3 1/3/2010
1 4 1/4/2010
1 2 1/8/2010
2 1 1/1/2010
2 5 2/5/2010
3 7 12/26/2009
3 2 1/1/2010
3 6 4/3/2010
SECOND
Product Start Date End Date
1 1/3/2010 1/4/2010
2 1/1/2010 1/1/2010
3 12/26/2009 4/3/2010
POST_SECOND
Product Start Date End Date Price_Start Price_End
1 1/3/2010 1/4/2010 3 4
2 1/1/2010 1/1/2010 1 1
3 12/26/2009 4/3/2010 7 6
Here's a merge/keep/sort/collapse* solution that relies on using the last date. I altered your example data slightly.
/* Make Fake Data & Convert Dates to Date Format */
clear
input byte Product byte Price str12 str_date
1 3 "1/1/2010"
1 3 "1/3/2010"
1 4 "1/4/2010"
1 2 "1/8/2010"
2 1 "1/1/2010"
2 5 "2/5/2010"
3 7 "12/26/2009"
3 7 "12/28/2009"
3 2 "1/1/2010"
3 6 "4/3/2010"
4 8 "12/30/2012"
4 9 "12/31/2012"
4 10 "1/2/2013"
4 10 "1/3/2013"
end
gen Date = date(str_date,"MDY")
format Date %td
drop str_date
save "First.dta", replace
clear
input byte Product str12 str_Start_Date str12 str_End_Date
1 "1/3/2010" "1/4/2010"
2 "1/1/2010" "1/1/2010"
3 "12/27/2009" "4/3/2010"
4 "1/1/2013" "1/2/2013"
end
gen Start_Date = date(str_Start_Date,"MDY")
gen End_Date = date(str_End_Date,"MDY")
format Start_Date End_Date %td
drop str_*
save "Second.dta", replace
/* Data Transformation */
use "First.dta", clear
merge m:1 Product using "Second.dta", nogen
bys Product: egen ads = min(abs(Start_Date-Date))
bys Product: egen ade = min(abs(End_Date - Date))
keep if (ads==abs(Date - Start_Date) & Date <= Start_Date) | (ade==abs(Date - End_Date) & Date <= End_Date)
sort Product Date
collapse (first) Price_Start = Price (last) Price_End = Price, by(Product Start_Date End_Date)
list, clean noobs
*Some people are reshapers. Others are collapsers. Often both can get the job done, but I think collapse is easier in this case.
In Stata, I've never been able to get something like this to work nicely in one step (something you can do in SAS via a SQL call). In any case, I think you'd be better off creating an intermediate file from FIRST.dta and then merging that 2x on each of your StartDate and EndDate variables in SECOND.dta.
Say you have data for price adjustments from Jan 1, 2010 to Dec 31, 2013 (specified with varied intervals as you have shown above). I assume all the date variables are already in date format in FIRST.dta & SECOND.dta, and that variable names in SECOND do not have spaces in them.
tempfile prod prices
use FIRST.dta, clear
keep Product
duplicates drop
save `prod'
clear
set obs 1096
g Date=date("12-31-2009","MDY")+_n
format date %td
cross using `prod'
merge 1:1 Product Date using FIRST.dta, assert(1 3) nogen
gsort +Product +Date /*this ensures the data are sorted properly for the next step */
replace price=price[_n-1] if price==. & Product==Product[_n-1]
save `prices'
use SECOND.dta, clear
foreach i in Start End {
rename `i'Date Date
merge 1:1 Product Date using `prices', assert(2 3) keep(3) nogen
rename Price Price_`i'
rename Date `i'Date
}
This should work if I understand your data structures correctly, and it should address the issue being discussed in the comments to #Dimitriy's answer. I'm open to critiques on how to make this nicer as its something I've had to do a few times and this is how I usually go about it.

Resources