Related
I have no idea how to name the question properly, but here is sample data:
CREATE TABLE dbo.test_data
(
row_version VARBINARY(8)
, account_number CHAR(8)
, account_balance DECIMAL(10, 2)
, group_rank BIGINT
, rownum BIGINT
);
INSERT INTO dbo.test_data
VALUES (0x000000000013fd24, '46436663', 123.00, 4, 86)
, (0x000000000013fd23, '46436663', 123.00, 4, 86)
, (0x000000000013fd22, '46436663', 123.00, 4, 85)
, (0x000000000013fd21, '46436663', 123.00, 4, 85)
, (0x000000000013fd20, '46436663', 123.00, 4, 83)
, (0x000000000013fd1f, '46436663', 555.00, 2, 83)
, (0x000000000013fd21, '46436663', 123.00, 4, 85)
, (0x000000000013fd20, '46436663', 123.00, 4, 83)
, (0x000000000013fd21, '46436663', 123.00, 4, 85)
, (0x000000000013fd20, '46436663', 123.00, 4, 83)
, (0x000000000013fd1e, '46436664', 12345.00, 5, 82)
, (0x000000000013fd1d, '46436664', 12345.00, 5, 82)
, (0x000000000013fd1c, '46436664', 12345.00, 5, 82)
, (0x000000000013fd1b, '46436664', 12345.00, 5, 81)
, (0x000000000013fd1a, '46436664', 12345.00, 5, 81)
, (0x000000000013fd19, '46436664', 12345.00, 5, 78)
, (0x000000000013fcb3, '46436664', 123.00, 6, 77)
, (0x000000000013fcb2, '46436664', 123.00, 6, 77)
, (0x000000000013fcb1, '46436664', 123.00, 6, 76)
, (0x000000000013fcb0, '46436664', 123.00, 6, 76);
This is how data look like:
SELECT * FROM dbo.test_data
ORDER BY row_version DESC
Here 1 and 4 (blue) are sequential group numbers, that have minimum sequence of 2 the same values in a roll of you order them by row_version. I need to find first occurrence of different group_rank (2, red) and then check rownum value (3,6, purple) where it is MIN(row_num) for upper group (blue) and row_num for the record that comes just before the group ended (red). If these values differs by 1, then I need to the account_number then I need to return it, otherwise - I don't need to return it.
I'm not interested in what happened below 2 and 5 points (red) for the accounts.
So, by looking to that data, the only account should be returned - 46436664 as for 46436663 rownum value is the same (83).
Interesting problem I guess the first step is finding the problem account_numbers, which you could do like this.
select
*
from
dbo.test_data t1
cross apply (
select top 1
*
from
dbo.test_data
where
account_number = t1.account_number
and rownum = t1.rownum
and row_version > t1.row_version
order by
row_version asc) t2
where
t1.group_rank <> t2.group_rank
order by
t1.row_version;
Then you could do this
select distinct
t0.account_number
from
test_data t0
except
select distinct
t1.account_number
from
dbo.test_data t1
cross apply (
select top 1
*
from
dbo.test_data
where
account_number = t1.account_number
and rownum = t1.rownum
and row_version > t1.row_version
order by
row_version asc) t2
where
t1.group_rank <> t2.group_rank
I have the following table:
create table public.dctable
(
prod int,
customer varchar(100),
city varchar(100),
num int,
tim datetime,
dc smallint
);
insert into dctable
values (1, 'Jim', 'Venice', 5, '2015-08-27 1:10:00', 0),
(1, 'Jim', 'Venice', 5, '2015-08-27 1:10:15', 0),
(1, 'Jim', 'Venice', 5, '2015-08-27 1:10:28', 0),
(4, 'Jane', 'Vienna', 8, '2018-06-04 2:20:43', 0),
(4, 'Jane', 'Vienna', 8, '2018-06-04 2:20:45', 0),
(4, 'Jane', 'Vienna', 8, '2018-06-04 2:20:49', 0),
(4, 'Jane', 'Vienna', 8, '2018-06-04 2:30:55', 0),
(7, 'Jack', 'Vilnius', 4, '2015-09-15 2:20:55', 0),
(7, 'Jake', 'Vigo', 9, '2018-01-01 10:20:05', 0),
(7, 'Jake', 'Vigo', 2, '2018-01-01 10:20:25', 0);
Now I want to update the column dc to the value of tdc in this query:
select
t.*,
(case
when lead(tim) over (partition by prod, customer, city, num order by tim) <= dateadd(second, 30, tim)
then 1
else 0
end) as tdc
from
public.dctable t
So I have tried this:
update public.dctable
set dc = b.tdc
from
(select
t.*,
(case
when lead(tim) over (partition by prod, customer, city, num order by tim) <= dateadd(second, 30, tim)
then 1
else 0
end) as tdc
from
public.dctable t) b
where
public.dctable.prod = b.prod
and public.dctable.customer = b.customer
and public.dctable.city = b.city
and public.dctable.num = b.num;
But when I query the results, dc is still 0 for all rows.
select * from public.dctable;
prod customer city num tim dc
-------------------------------------------------------------
1 Jim Venice 5 2015-08-27 01:10:00 0
1 Jim Venice 5 2015-08-27 01:10:28 0
1 Jim Venice 5 2015-08-27 01:10:15 0
4 Jane Vienna 8 2018-06-04 02:20:49 0
4 Jane Vienna 8 2018-06-04 02:20:45 0
4 Jane Vienna 8 2018-06-04 02:30:55 0
4 Jane Vienna 8 2018-06-04 02:20:43 0
7 Jake Vigo 2 2018-01-01 10:20:25 0
7 Jack Vilnius 4 2015-09-15 02:20:55 0
7 Jake Vigo 9 2018-01-01 10:20:05 0
How can I get it to update the column dc to the value of tdc from the inner query above?
Thanks
This seems to be what you want.
SQL Fiddle
update d
set d.dc = b.dc2
from dctable d
inner join
(select
*,
dc2 = case
when lead(tim) over (partition by prod, customer, city, num order by tim) <= dateadd(second, 30, tim)
then 1
else 0
end
from dctable) b on
d.prod = b.prod
and d.customer = b.customer
and d.city = b.city
--and d.tim = b.tim --you may also want this join clause.
and d.num = b.num;
select * from dctable
I have an example on sql fiddle. What I am trying to do is divide the overall COUNT(DISTINCT ID) by the weekly COUNT(DISTINCT ID). For example if I have the following conceptual setup of what the result should be.
year week id_set overall_distinct week_distinct result
2016 1 A,A,A,B,B,C 0 3 0
2016 2 A,B,C,C,D 1 4 .25
2016 3 A,B,C,E,F 2 5 .4
The table linked to on sql fiddle has the following schema. Also, in reality I do have multiple values for 'year'.
CREATE TABLE all_ids
([year] int, [week] int, [id] varchar(57))
;
INSERT INTO all_ids
([year], [week], [id])
VALUES
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'B'),
(2016, 1, 'B'),
(2016, 1, 'C'),
(2016, 2, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C'),
(2016, 2, 'C'),
(2016, 2, 'D'),
(2016, 3, 'A'),
(2016, 3, 'B'),
(2016, 3, 'C'),
(2016, 3, 'E'),
(2016, 3, 'F')
;
Edit
I apologize for the confusion. The above table was just a conceptual example of the result. The actual result only needs to look like the following.
year week overall_distinct week_distinct result
2016 1 0 3 0
2016 2 1 4 .25
2016 3 2 5 .4
there is no need to include id_set
I used dense_rank and max() over () to simulate count (distinct ...) with window functions. You could try to do it with another subquery
select
year, week
, id_set = stuff((
select
',' + a.id
from
all_ids a
where
a.year = t.year
and a.week = t.week
order by a.id
for xml path('')
), 1, 1, '')
, overall_distinct = count(case when cnt = 1 then 1 end)
, week_distinct = count(distinct id)
, result = cast(count(case when cnt = 1 then 1 end) * 1.0 / count(distinct id) as decimal(10, 2))
from (
select
year, week, id, cnt = max(dr) over (partition by id)
from (
select
*, dr = dense_rank() over (partition by id order by year, week)
From
all_ids
) t
) t
group by year, week
Output
year week id_set overall_distinct week_distinct result
--------------------------------------------------------------------------
2016 1 A,A,A,B,B,C 0 3 0.00
2016 2 A,B,C,C,D 1 4 0.25
2016 3 A,B,C,E,F 2 5 0.40
This would be one way, probably not the best one:
;with weekly as
(
select year, week, count(distinct id) nr
from all_ids
group by year, week
),
overall as
(
select a.week, count(distinct a.id) nr
from all_ids a
where a.id not in (select id from all_ids where week <> a.week and id = a.id )
group by week
)
select distinct a.year
, a.week
, stuff((select ', ' + id
from all_ids
where year = a.year and week = a.week
for xml path('')), 1, 1, '') ids
, w.Nr weeklyDistinct
, isnull(t.Nr, 0) overallDistinct
from all_ids a join weekly w on a.year = w.year and a.week = w.week
left join overall t on t.week = a.week
One statement count only
declare #t table (y int, w int, id varchar(57));
INSERT #t (y, w, id)
VALUES
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'B'),
(2016, 1, 'B'),
(2016, 1, 'C'),
(2016, 2, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C'),
(2016, 2, 'C'),
(2016, 2, 'D'),
(2016, 3, 'A'),
(2016, 3, 'B'),
(2016, 3, 'C'),
(2016, 3, 'E'),
(2016, 3, 'F');
select t1.w, count(distinct t1.id) as wk
, (count(distinct t1.id) - count(distinct t2.id)) as [all]
, (cast(1 as smallmoney) - cast(count(distinct t2.id) as smallmoney) / count(distinct t1.id)) as [frac]
from #t t1
left join #t t2
on t2.id = t1.id
and t2.w <> t1.w
group by t1.w
order by t1.w;
I have requirement to copy previous rows values to next all rows in sql server.
With LAG function, I can achieve this one only for next row. but I have to copy more than rows.
Here is sample example :
Query:
SELECT
t1.ID,
t1.CustID,
t2.ID,
t2.CustID,
t2.Flag,
COALESCE(
t2.Flag,
(
SELECT TOP 1 l.Flag
FROM TBL2 l
WHERE l.CustID = t1.CustID AND l.ID < t1.ID
ORDER BY l.ID desc
)) as 'Final'
FROM
TBL1 t1
LEFT OUTER JOIN TBL2 t2 ON t2.ID = t1.ID AND t2.CustID = t1.CustID
ORDER BY
t1.CustID,
t1.ID desc
Setup:
CREATE TABLE TBL1 (ID int, CustID int)
GO
CREATE TABLE TBL2 (ID int, CustID int, Flag bit)
GO
INSERT INTO TBL1 (ID, CustID)
SELECT 1, 11 UNION ALL
SELECT 1, 12 UNION ALL
SELECT 1, 13 UNION ALL
SELECT 1, 14 UNION ALL
SELECT 1, 15 UNION ALL
SELECT 1, 16 UNION ALL
SELECT 1, 17 UNION ALL
SELECT 2, 11 UNION ALL
SELECT 2, 12 UNION ALL
SELECT 2, 13 UNION ALL
SELECT 2, 14 UNION ALL
SELECT 2, 15 UNION ALL
SELECT 2, 16 UNION ALL
SELECT 2, 17 UNION ALL
SELECT 3, 11 UNION ALL
SELECT 3, 12 UNION ALL
SELECT 3, 13 UNION ALL
SELECT 3, 14 UNION ALL
SELECT 3, 15 UNION ALL
SELECT 3, 16 UNION ALL
SELECT 3, 17 UNION ALL
SELECT 4, 11 UNION ALL
SELECT 4, 12 UNION ALL
SELECT 4, 13 UNION ALL
SELECT 4, 14 UNION ALL
SELECT 4, 15 UNION ALL
SELECT 4, 16 UNION ALL
SELECT 4, 17
GO
INSERT INTO TBL2 (ID, CustID, Flag)
SELECT 1, 11, 0 UNION ALL
SELECT 1, 12, 1 UNION ALL
SELECT 1, 13, 1 UNION ALL
SELECT 1, 14, 0 UNION ALL
SELECT 1, 15, 0 UNION ALL
SELECT 1, 16, 0 UNION ALL
SELECT 1, 17, 1 UNION ALL
SELECT 2, 11, 1 UNION ALL
SELECT 2, 13, 0 UNION ALL
SELECT 2, 14, 1 UNION ALL
SELECT 2, 15, 1 UNION ALL
SELECT 2, 17, 0 UNION ALL
SELECT 3, 13, 1 UNION ALL
SELECT 3, 15, 0 UNION ALL
SELECT 3, 17, 1 UNION ALL
SELECT 4, 12, 0 UNION ALL
SELECT 4, 17, 0
GO
You can try this:
SELECT *
,ISNULL(b.flag,(MAX(b.flag) OVER (PARTITION BY a.[custid] ORDER BY b.[id] DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))
FROM #TBL1 A
LEFT JOIN #TBL2 B
ON A.id = b.id
AND A.custid = b.custid
ORDER BY a.[custid] ASC, a.[id] DESC
Here is the sample data:
DECLARE #TBL1 TABLE
(
[id] TINYINT
,[custid] TINYINT
);
DECLARE #TBL2 TABLE
(
[id] TINYINT
,[custid] TINYINT
,[flag] TINYINT
);
INSERT INTO #TBL1([id], [custid])
VALUES
(1,11)
,(1,12)
,(1,13)
,(1,14)
,(1,15)
,(1,16)
,(1,17)
,(2,11)
,(2,12)
,(2,13)
,(2,14)
,(2,15)
,(2,16)
,(2,17)
,(3,11)
,(3,12)
,(3,13)
,(3,14)
,(3,15)
,(3,16)
,(3,17)
,(4,11)
,(4,12)
,(4,13)
,(4,14)
,(4,15)
,(4,16)
,(4,17);
INSERT INTO #TBL2 ([id], [custid], [flag])
VALUES (1,11,0)
,(1,12,1)
,(1,13,1)
,(1,14,0)
,(1,15,0)
,(1,16,0)
,(1,17,1)
,(2,11,1)
,(2,13,0)
,(2,14,1)
,(2,15,1)
,(2,17,0)
,(3,13,1)
,(3,15,0)
,(3,17,1)
,(4,12,0)
,(4,17,0)
You can use a LEFT JOIN ,a CASE EXPRESSION and a correlated query :
SELECT t.id,t.custid,s.id,s.custid,
COALESCE(s.flag,(SELECT TOP 1 ss.flag FROM tbl2 ss
WHERE ss.custid = t.custid and ss.id < t.id
ORDER BY ss.id DESC)) as flag
FROM Tbl1 t
LEFT JOIN tbl2 s
ON(t.custid = s.custid and s.id = t.id)
I have a table with a primary key (bigint), datetime, value, foreignKey to configuration tabel that consists of 100,000's of rows. I want to be able to obtain a row for a variable time interval. For example.
Select Timestamp, value from myTable where configID=3
AND{most recent for 15 min interval}
I have a CTE query that returns multiple rows for the interval interval
WITH Time_Interval(timestamp, value, minutes)
AS
(
Select timestamp, value, DatePart(Minute, Timestamp) from myTable
Where Timestamp >= '12/01/2012' and Timestamp <= 'Jan 10, 2013' and
ConfigID = 435 and (DatePart(Minute, Timestamp) % 15) = 0
)
Select Timestamp, value, minutes from Time_Interval
group by minutes, value, timestamp
order by Timestamp
such as:
2012-12-19 18:15:22.040 6.98 15
2012-12-19 18:15:29.887 6.98 15
2012-12-19 18:15:33.480 7.02 15
2012-12-19 18:15:49.370 7.01 15
2012-12-19 18:30:41.920 6.95 30
2012-12-19 18:30:52.437 6.93 30
2012-12-19 19:15:18.467 7.13 15
2012-12-19 19:15:34.250 7.11 15
2012-12-19 19:15:49.813 7.12 15
But as can be seen there are 4 for the 1st 15 minute interval, 2 for the next interval, etc... Worse,
If no data was obtain at an exact times stamp of 15 minutes, then there will be no value.
What I want is the most recent value for a fifteen minute interval... if if the only data for that intervall occurred at 1 second after the start of the interval.
I was thinking of Lead/over but again... the rows are not orgainzed that way. Primary Key is a bigInt and is a clustered Index. Both the timstamp column and ConfigID columns are Indexed. The above query returns 4583 rows in under a second.
Thanks for any help.
Try this on for size. It will even handle returning one row for instances when you have multiple timestamps for a given interval.
NOTE: This assumes your Bigint PK column is named: idx. Just substitute where you see "idx" if it is not.
;WITH Interval_Helper([minute],minute_group)
AS
(
SELECT 0, 1 UNION SELECT 1, 1 UNION SELECT 2, 1 UNION SELECT 3, 1 UNION SELECT 4, 1
UNION SELECT 5, 1 UNION SELECT 6, 1 UNION SELECT 7, 1 UNION SELECT 8, 1 UNION SELECT 9, 1
UNION SELECT 10, 1 UNION SELECT 11, 1 UNION SELECT 12, 1 UNION SELECT 13, 1 UNION SELECT 14, 1
UNION SELECT 15, 2 UNION SELECT 16, 2 UNION SELECT 17, 2 UNION SELECT 18, 2 UNION SELECT 19, 2
UNION SELECT 20, 2 UNION SELECT 21, 2 UNION SELECT 22, 2 UNION SELECT 23, 2 UNION SELECT 24, 2
UNION SELECT 25, 2 UNION SELECT 26, 2 UNION SELECT 27, 2 UNION SELECT 28, 2 UNION SELECT 29, 2
UNION SELECT 30, 3 UNION SELECT 31, 3 UNION SELECT 32, 3 UNION SELECT 33, 3 UNION SELECT 34, 3
UNION SELECT 35, 3 UNION SELECT 36, 3 UNION SELECT 37, 3 UNION SELECT 38, 3 UNION SELECT 39, 3
UNION SELECT 40, 3 UNION SELECT 41, 3 UNION SELECT 42, 3 UNION SELECT 43, 3 UNION SELECT 44, 3
UNION SELECT 45, 4 UNION SELECT 46, 4 UNION SELECT 47, 4 UNION SELECT 48, 4 UNION SELECT 49, 4
UNION SELECT 50, 4 UNION SELECT 51, 4 UNION SELECT 52, 4 UNION SELECT 53, 4 UNION SELECT 54, 4
UNION SELECT 55, 4 UNION SELECT 56, 4 UNION SELECT 57, 4 UNION SELECT 58, 4 UNION SELECT 59, 4
)
,Time_Interval([timestamp], value, [date], [hour], minute_group)
AS
(
SELECT A.[Timestamp]
,A.value
,CONVERT(smalldatetime, CONVERT(char(10), A.[Timestamp], 101))
,DATEPART(HOUR, A.[Timestamp])
,B.minute_group
FROM myTable A
JOIN Interval_Helper B
ON (DATEPART(minute, A.[Timestamp])) = B.[minute]
AND A.[Timestamp] >= '12/01/2012'
AND A.[Timestamp] <= '01/10/2013'
AND A.ConfigID = 435
)
,Time_Interval_TimeGroup([date], [hour], [minute], MaxTimestamp)
AS
(
SELECT [date]
,[hour]
,minute_group
,MAX([Timestamp]) as MaxTimestamp
FROM Time_Interval
GROUP BY [date]
,[hour]
,minute_group
)
,Time_Interval_TimeGroup_Latest(MaxTimestamp, MaxIdx)
AS
(
SELECT MaxTimestamp
,MAX(idx) as MaxIdx
FROM myTable A
JOIN Time_Interval_TimeGroup B
ON A.[Timestamp] = B.MaxTimestamp
GROUP BY MaxTimestamp
)
SELECT A.*
FROM myTable A
JOIN Time_Interval_TimeGroup_Latest B
ON A.idx = B.MaxIdx
ORDER BY A.[timestamp]
This is another take on the clever time group function from #MntManChris below:
CREATE FUNCTION dbo.fGetTimeGroup (#DatePart tinyint, #Date datetime)
RETURNS int
AS
BEGIN
RETURN CASE #DatePart
WHEN 1 THEN DATEPART(mi, #Date)
WHEN 2 THEN DATEPART(mi, #Date)/5 + 1 -- 5 min
WHEN 3 THEN DATEPART(mi, #Date)/15 + 1 -- 15 min
WHEN 4 THEN DATEPART(mi, #Date)/30 + 1 -- 30 min
WHEN 5 THEN DATEPART(hh, #Date) -- hr
WHEN 6 THEN DATEPART(hh, #Date)/6 + 1 -- 6 hours
WHEN 7 THEN DATEPART(hh, #Date)/12 + 1 -- 12 hours
WHEN 8 THEN DATEPART(d, #Date) -- day
ELSE -1
END
END
If you want to partition in 15 minute interval use datediff in minutes and divide by 15.
And use that partition to rank each interval.
WITH myTbl AS
(
SELECT
timestamp, value,
RANK() OVER (PARTITION BY (DATEDIFF(Mi,0, Timestamp)/15) ORDER BY Timestamp desc) RK
FROM myTable
--WHERE Timestamp BETWEEN '' AND ''
)
SELECT * FROM myTble
WHERE RK <= 1
As my comment above says I've used Rob's answer but implmented a user function to eliminate the Interval_Helper table and the first join. Here is the code for the user function.
BEGIN
DECLARE #Ans integer
if #DatePart = 1 -- min
return DATEPART(mi, #Date)
if #DatePart = 2 -- 5 min
return DatePart(mi,#Date)/5 + 1
if #DatePart = 3 -- 15 min
return DatePart(mi,#Date)/15 + 1
if #DatePart = 4 -- 30min
return DatePart(mi,#Date)/30 + 1
if #DatePart = 5 -- hr
return DATEPART(hh, #Date)
if #DatePart = 6 -- 6 hours
return DATEPART(hh, #Date)/6 + 1
if #DatePart = 7 -- 12 hours
return DATEPART(hh, #Date)/12 + 1
if #DatePart = 8 -- day
return DATEPART(d, #Date)
return -1
END
This then made the Time_Interval table look like
;WITH Time_Interval([timestamp], value, [date], [day], time_group)
AS
(
SELECT A.[Timestamp]
,A.value
,CONVERT(smalldatetime, CONVERT(char(10), A.[Timestamp], 101))
,DATEPART(dd, A.[Timestamp])
,dbo.fGetTimeGroup(#tInterval, A.[Timestamp]) as 'time_group'
FROM myTable A
where
A.[Timestamp] >= '12/01/2012'
AND A.[Timestamp] <= '01/10/2013'
AND A.ConfigID= 435
)
Since there is a switch from "hours" to "days" as the #TimeInterval goes from 1hr to 6hr, or 12hr or every day. I also had to have the Time_Interval_TimeGroup table switch from grouping by [hour] to grouping by [day] and of course having this in the select list.
Since this is part of a much larger abstract DB schema where both the table in question and the db are functions of the ConfigID and thus required dynamic SQL, implmenting this switch in grouping was not an issue, I simply implmented two different dynSql sections based on the value of #TimeInterval
Thanks