sql server select the first occurrence of data change - sql-server

I have a sql server 2008 r2 database.
I have a table called hystrealdata in which are stored production data of an automotiv machine every n seconds. Thus, it is structured like this:
dataregvalue timestamp
--------------------------------------------------------------------------
0 1507190476
0 1507190577
0 1507190598
0 1507190628
1 1507190719
1 1507190750
1 1507190780
1 1507190811
1 1507190841
2 1507190861
2 1507190892
2 1507190922
2 1507190953
2 1507190983
5 1507190477
I need to select the first occurrence of a dataregvalue in the first row, then the difference between the next dataregvalue and the previous one. Next to this data I would like to have the first timestamp in which dataregvalue canges. An example of the select would be:
data_change timestamp
---------------------------
0 1507190476 <- first time in which the dataregvalue is 0
1 1507190719 <- first time in which the dataregvalue changes
1 1507190861 <- first time in which the dataregvalue changes
3 1507190477 <- first time in which the dataregvalue changes
If this is too difficult, it would be fine to have the information about the difference between dataregvalues in a new column like this:
dataregvalue data_change timestamp
---------------------------------------------
0 0 1507190476
1 1 1507190719
2 1 1507190861
5 3 1507190477
How can this be done?
Thanks in advance!

You can use the LAG analytic function to read the previous value in a partition, eg :
Select
dataregvalue,
dataregvalue - LAG(dataregvalue,1) OVER (ORDER BY timestamp) as data_change,
timestamp
from MyTable
This will return the change on all rows. The rows where there is a change will have a data_change value >0. The first row will have a NULL value because there is no previous row.
Unfortunately, you can't refer to data_change in the WHERE clause. You'll have to use a CTE :
WITH changes as (
Select
dataregvalue,
dataregvalue - LAG(dataregvalue,1) OVER (ORDER BY timestamp) as data_change,
timestamp
from MyTable
)
select *
from changes
where
data_change >0 or
data_change is null
The LAG and the corresponing LEAD functions can be used to detect gaps and islands in a sequence as well. Each row will have an ID that is one greater than the previous one. In a gap, the difference will be >1.

Related

update date column of a table in a database

How can I update date column of a table in a database(mssql) by 1 year for 1st 1000 data, 2 year for 2nd 1000 data and so on... I know how to implement it by assigning temporary id but is there a way to update data in a loop manner??
for example:
suppose if I have 6000 datas in table with joined_date column in range from 2012-01-01 to 2017-01-01 ordered in ascending order, I want to update first thousand rows increasing it by 1 year, 2nd thousand rows by 1 year as well and so on...
If my first thousand data contain joined date on year 2012, I want to update it to 2013 and if my 2nd thousand data contain joined date on year 2012 to 2013 then I want to increment it by 1 as well.
We can try assigning a row number to your table, then use it to do the updates:
WITH cte AS (
SELECT joined_date, ROW_NUMBER() OVER (ORDER BY joined_date) - 1 rn
FROM yourTable
)
UPDATE cte
SET joined_date = DATEADD(year, (rn % 1000) + 1, joined_date);
The trick here is that the first 1000 rows, which would receive a row number of 0 up to and including 999, would have an rn % 1000 value of 0, to which we add 1 to get the number of years to add. The next 1000 records would have 2 years added, and so on.

How to select top 3 maximum values from SQL Server column

I have values in SQL Server derived column sorted as descending i.e.
id Count OR id Count OR id Count
1 4 1 5 1 11
2 4 2 2 2 1
3 4 3 1 3 1
4 4 4 1 4 1
5 4 5 1 5 1
Now I want to select top 3 maximum values. How can I select so that query returns consistent result every time.
For example if values of Count are same which id's should be returned as top 3 maximums, similarly if 3rd value is matched with other values and if 2nd value is matched with other values then which id's should be returned by the query. And the result should be consistent every time I execute the query.
The with ties argument of the top function will return all the of the rows which match the top values:
select top (3) with ties id, count from table1
order by count desc
Alternatively, if you wanted to return 3 values only, but make sure they are always the same 3 values, then you will need to use something else as a tie-breaker. In this case, it looks like your id column could be unique.
select top (3) id, count from table1
order by count desc, id

SQL Server query to display all columns but with distinct values in one of the columns (not grouping anything)

I have a table with 106 columns. One of those columns is a "Type" column with 16 types.
I want 16 rows, where the Type is distinct. So, row 1 has a type of "Construction", row 2 has a type of "Elevator PVT", etc.
Using Navicat.
From what I've found (and understood) so far, I can't use Distinct (because that looks across all rows), I can't use Group By (because that's for aggregating data, which I'm not looking to do), so I'm stuck.
Please be gentle- I'm really really new at this.
Below is a part of the table (how can I share this normally?)- it's really big so I didn't share the whole thing. Below is a partial result I'm looking for, where the Violation_Type is unique and the rest of the columns display.
Got it.. Sheesh... (took me forever, but got it...)
D_ID B_ID V_ID V_Type S_ID c_f d_y l_u p_s du_p
------ ------ ------- -------------- ------ ----- ------ ------ ----- ------
184 117 V 032 Elevator PVT 2 8 0 0
4 140 V 100 Construction 1 8 0 0
10 116 V 122 Electric 1 8 2005 0 0
11 117 V 033 Boiler Local 1 0 2005 0 0
You can use ROW_NUMBER for this:
SELECT *
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY V_Type ORDER BY (SELECT NULL))
FROM tbl
)t
WHERE rn = 1
Modify the ORDER BY depending on what row you want to prioritize.
From the documentation:
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
This means that for every row within a partition (specified by the PARTITION BY clause), sql-server assigns a number from 1 depending on the order specified in the ORDER BY clause.
ROW_NUMBER requires an ORDER BY clause. SELECT NULL tells the sql-server that we do not want to enforce a particular order. We just want the rows numbered by partition.
The WHERE rn = 1 obviously filters only rows that has a ROW_NUMBER of 1. This gives you one row for every V_TYPE available.

Computing a field to identify chronological order of datapoints that share the same ID

I using Microsoft SQL Server 2008 to try and identify the chronological order of data points in order to create a filter field that will allow me to create a query that only includes the first and last record for each ID number, where multiple rows represent different data points from the same ID
Here is an example of my current data and desired data to give a better idea of what I mean:
Current Data
ID Indicator Date
1 1 1988-02-11
1 1 1989-03-9
1 1 1993-04-3
1 1 2001-05-4
2 1 2000-01-01
2 1 2001-02-03
2 1 2002-04-22
3 1 1990-02-01
3 1 1998-02-01
3 1 1999-03-02
3 1 2000-04-02
4 0 NA
Desired Data
ID Indicator Date Order_Indicator
1 1 1988-02-11 1
1 1 1989-03-9 2
1 1 1993-04-3 3
1 1 2001-05-4 4
2 1 2000-01-01 1
2 1 2001-02-03 2
2 1 2002-04-22 3
3 1 1990-02-01 1
3 1 1998-02-01 2
3 1 1999-03-02 3
3 1 2000-04-02 4
4 0 NULL NULL
The field I want to create is the "Order_Indicator" field in the "Desired Data" table and with the only relevant records are records with Indicator = 1. With this information I would create a query where I only select the rows where Order_Indicator = 1 and Order_Indicator = MAX(Order_Indicator) for each "row group" that share the same ID. Does anyone have any idea about how I might go about this? I know I could do this very easily in Excel but I need to do it on SQL server in order for it to be reproducible with my colleagues.
Thank you so much in advance!
You can do this with the ranking functions:
select c.*,
(case when indicator = 1
then row_number() over (partition by id, indicator order by [date])
end) as OrderIndicator
from current c
This assigns a sequential number based on the date and indicator. The case statement takes care of the indicator = 0 case.
By the way, this assumes that "date" is being stored as a date.
Use below query :
select YourTable.ID,
YourTable.indicator,
case when date<>'NA' then date end as date,
case when indicator = 1 then row_number() over (partition by id, indicator order by ID) end as Order_Indicator
from YourTable

Average calculation through time scale (SQL Server) - preparation for charting

I have the following table in SQL Server Express edition:
Time Device Value
0:00 1 2
0:01 2 3
0:03 3 5
0:03 1 3
0:13 2 5
0:22 1 7
0:34 3 5
0:35 2 6
0:37 1 5
The table is used to log the events of different devices which are reporting their latest values. What I'd like to do is to prepare the data in a way that I'd present the average data through time scale and eventually create a chart using this data. I've manipulated this example data in Excel in the following way:
Time Average value
0:03 3,666666667
0:13 4,333333333
0:22 5,666666667
0:34 5,666666667
0:35 6
0:37 5,333333333
So, at time 0:03 I need to take latest data I have in the table and calculate the average. In this case it's (3+3+5)/3=3,67. At time 0:13 the steps would be repeated, and again at 0:22,...
As I'd like to leave the everything within the SQL table (I wouldn't like to create any service with C# or similar which would grab the data and store it into some other table)
I'd like to know the following:
is this the right approach or should I use some other concept of calculating the average for charting data preparation?
if yes, what's the best approach to implement it? Table view, function within the database, stored procedure (which would be called from the charting API)?
any suggestions on how to implement this?
Thank you in advance.
Mark
Update 1
In the mean time I got one idea how to approach to this problem. I'd kindly ask you for your comments on it and I'd still need some help in getting the problem resolved.
So, the idea is to crosstab the table like this:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 NULL 3 NULL
0:03 3 NULL 5
0:13 NULL 5 NULL
0:22 7 NULL NULL
0:34 NULL NULL 5
0:35 NULL 6 NULL
0:37 5 NULL NULL
The query for this to happen would be:
SELECT Time,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 1) AS Device1Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 2) AS Device2Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 3) AS Device3Value
FROM dbo.Event S GROUP BY Time
What I'd still need to do is to write a user defined function and call it within this query which would write last available value in case of NULL and if the last available value doesn't exist it would leave NULL value. With this function I'd get the following results:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 2 3 NULL
0:03 3 3 5
0:13 3 5 5
0:22 7 5 5
0:34 7 5 5
0:35 7 6 5
0:37 5 6 5
And by having this results I'd be able to calculate the average for each time by only SUMing up the 3 relevant columns and dividing it by count (in this case 3). For NULL I'd use 0 value.
Can anybody suggest how to create a user defined function for replacing NULL values with latest value?
Update 2
Thanks Martin.
This query worked but it took almost 21 minutes to go through the 13.576 lines which is far too much.
The final query I used was:
SELECT Time,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 1 ORDER BY e.Time DESC) AS Device1Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 2 ORDER BY e.Time DESC) AS Device2Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 3 ORDER BY e.Time DESC) AS Device3Value
FROM dbo.Event S GROUP BY Time
but I've extended it to 10 devices.
I agree that this is not the best way to do it. Is there any other way to prepare the data for the average calculation because this takes just too much of the processing.
Here's one way. It uses the "Quirky Update" approach to filling in the gaps. This relies on an undocumented behaviour so you may prefer to use a cursor for this.
DECLARE #SourceData TABLE([Time] TIME, Device INT, value FLOAT)
INSERT INTO #SourceData
SELECT '0:00',1,2 UNION ALL
SELECT '0:01',2,3 UNION ALL
SELECT '0:03',3,5 UNION ALL
SELECT '0:03',1,3 UNION ALL
SELECT '0:13',2,5 UNION ALL
SELECT '0:22',1,7 UNION ALL
SELECT '0:34',3,5 UNION ALL
SELECT '0:35',2,6 UNION ALL
SELECT '0:37',1,5
CREATE TABLE #tmpResults
(
[Time] Time primary key,
[1] FLOAT,
[2] FLOAT,
[3] FLOAT
)
INSERT INTO #tmpResults
SELECT [Time],[1],[2],[3]
FROM #SourceData
PIVOT ( MAX(value) FOR Device IN ([1],[2],[3])) AS pvt
ORDER BY [Time];
DECLARE #1 FLOAT, #2 FLOAT, #3 FLOAT
UPDATE #tmpResults
SET #1 = [1] = ISNULL([1],#1),
#2 = [2] = ISNULL([2],#2),
#3 = [3] = ISNULL([3],#3)
SELECT [Time],
(SELECT AVG(device)
FROM (SELECT [1] AS device
UNION ALL
SELECT [2]
UNION ALL
SELECT [3]) t) AS [Average value]
FROM #tmpResults
DROP TABLE #tmpResults
So one of the possible solutions which I found is far more efficient (less than a second for 14.574 lines). I haven't yet had time to review the results in details but on the first hand it looks promising. This is the code for the 3 device example:
SELECT Time,
SUM(CASE MAC WHEN '1' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '2' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '3' THEN Stock ELSE 0 END) Device1Value,
FROM dbo.Event
GROUP BY Time
ORDER BY Time
In any case I'll test the code provided by Martin to see if it makes any difference to the results.

Resources