Identify ranges in groups which are identified by a flag - sql-server

I have the following table:
declare #table table (dates int , is_missing tinyint, group_id numeric(18))
insert into #table(dates,is_missing,group_id)
select 20110719,0,1
union all
select 20110720,0,1
union all
select 20110721,0,1
union all
select 20110722,1,1
union all
select 20110723,0,1
union all
select 20110724,0,1
union all
select 20110725,0,1
union all
select 20110726,1,1
union all
select 20110727,0,1
union all
select 20110728,1,1
union all
select 20110723,1,3
union all
select 20110724,0,3
union all
select 20110725,0,3
union all
select 20110726,1,3
union all
select 20110727,0,3
select * from #table
order by group_id, dates
What I am trying to do is to return ranges of dates for each group which are identified by the missing day flag. To make this more clear the results of the query will have to look like this:
group_id start_date end_date days_count
1 20110719 20110721 3
1 20110723 20110725 3
1 20110727 20110727 1
3 20110724 20110725 2
3 20110727 20110727 1
The is_missing flag basicaly separates the ranges per group. It actually says that a date is missing and therefore all the other dates located between is_missing flags are the groups I am trying to find their start and end dates as well as their days numbers count.
Is there a simple way to do this?
Thanks a lot.

Here is a possible solution using Common Table Expression (CTE) and ROW_NUMBER(). This type of problem is known as islands. Using the concept that was used in this Stack Overflow question: sql group by only rows which are in sequence, the following query was formulated to produce desired output against the data provided by you.
This query works correctly if the data stored in the table is ordered by group_id and dates columns. I assume that is the case with your data. If not, you might need to tweak the solution.
Modified the query as per suggestions provided by Andriy M. Thanks to Andriy M.
The query has been changed so that it can provide correct output even if the date values in the table are not in sequence. The question has the date values stored in int data type instead of date format. So, two queries have been provided below. First query will work if the table contains date values stored in int data typeand the second query will work if the table contains date values stored in datetime or date data type.
This query will work only in SQL Server versions 2005 and above. Since you have tagged your question under sql-server-2008, I think this should work for you.
Screenshot #1 displays the data stored in the table. Screenshot #2 displays the output of the below mentioned queries against the table data.
Hope that helps.
Query for date values stored in int data type:
.
WITH cte AS
(
SELECT datenumeric
, is_missing
, group_id
, datenumeric
- DENSE_RANK() OVER (PARTITION BY is_missing ORDER BY group_id, datenumeric) AS partition_grp
FROM dbo.table_data
)
SELECT cte.group_id
, MIN(cte.datenumeric) AS start_date
, MAX(cte.datenumeric) AS end_date
, COUNT(cte.datenumeric) AS days_count
FROM cte
WHERE cte.is_missing = 0
GROUP BY cte.group_id
, cte.partition_grp
ORDER BY cte.group_id
, cte.partition_grp;
Query for date values stored in datetime or date data type:
.
WITH cte AS
(
SELECT datevalue
, is_missing
, group_id
, DATEDIFF(DAY, 0, datevalue)
- DENSE_RANK() OVER (PARTITION BY is_missing ORDER BY group_id, datevalue) AS partition_grp
FROM dbo.table_data
)
SELECT cte.group_id
, MIN(cte.datevalue) AS start_date
, MAX(cte.datevalue) AS end_date
, COUNT(cte.datevalue) AS days_count
FROM cte
WHERE cte.is_missing = 0
GROUP BY cte.group_id
, cte.partition_grp
ORDER BY cte.group_id
, cte.partition_grp;
Screenshot #1:
Screenshot #2:

With many thanks to Siva for the nice solution, I thought if there was one date missing in the data, the query would fail.
so I modified the query a little and used ROW_NUMBER() to fix that.
WITH cte AS
(
SELECT dates
, is_missing
, group_id
,ROW_NUMBER() OVER (ORDER BY group_id, dates) -
DENSE_RANK() OVER (PARTITION BY is_missing ORDER BY group_id, dates) AS partition_Id
FROM dbo.table_data
)
SELECT group_id
, MIN(dates) AS start_date
, MAX(dates) AS end_date
, COUNT(*) AS days_count
FROM cte
WHERE is_missing = 0
GROUP BY group_id
, partition_id
ORDER BY group_id
, partition_id;
Or maybe a missing date will never happen. :)

Related

comparing two dates in different AS columns

I have two columns with date values.I'd like to filter them to see the result only when the two columns have similar values.
I have two questions in "Where" part. Can anyone help me with this?
1)How can i compare the value between this two column with date values?
2)If i have varchar value instead of dates, how can i compare two values?
SELECT [USERNAME], count(*) AS [NumberOfHappening], min([date1]) AS [FirstDate], max([date2]) AS [SecondDate]
FROM TableMain
WHERE CAST([FirstDate] AS DATE) = CAST([SecondDate] AS DATE)
GROUP BY [USERNAME]
ORDER BY 'NumberOfHappening' DESC
Thanks.
Are the orginal date values appropriately typed or do you store date/time-values in string columns? If so, you should really change this...
If I get this correctly, you want to find records, where date1 and date2 are on the same day. Casting a DATETIME to DATE will get rid of the time portion.
You can use a CTE to use the column aliases directly
;WITH cte AS
(
SELECT [USERNAME], count(*) AS [NumberOfHappening], min([date1]) AS [FirstDate], max([date2]) AS [SecondDate]
FROM TableMain
GROUP BY [USERNAME]
)
SELECT *
FROM cte
WHERE CAST([FirstDate] AS DATE) = CAST([SecondDate] AS DATE)
ORDER BY NumberOfHappening DESC;

order by on set 1 union set 2 sql

I have this situation:
select name,
subject
from Table_1
where date > getdate()-1
group by name, subject
order by id desc
union
select name,
subject
from table_2
where name not like 'abc%'
Table_1 and table_2 has similar structure.
I need to order by in SET1 UNION SET 2
This is not allowed in sql server. says "ORDER BY items must appear in the select list". I dont understand why the problem is. I am selecting equal number of columns on both queries. only that I want the result set together.
(on SQL Server 2017)
Anybody help!!
Thanks in advance.
Elaborating on my comment
select name,
subject
from Table_1
where date > getdate()-1
--group by name, subject --this isn't needed
union
select name,
subject
from table_2
where name not like 'abc%'
order by <yourCol> desc --notice change here
And for a conditional order by, there are a few posts on that.
Also, you don't need the group by since union removes duplicates.
But, the error is clear that the column you want to order by must be contained in the select list...
If you want to keep the first set ordered before the second set, just use a static column....
select name,
subject,
1 as Sort
from Table_1
where date > getdate()-1
--group by name, subject --this isn't needed
union
select name,
subject,
2 as Sort
from table_2
where name not like 'abc%'
order by Sort asc--notice change here

StackExchange Query Help t-sql

Would anybody be able to help me with this exercise. I am used to querying on postgresql and not t-sql and I am running into trouble with how some of my data aggregates
My assignment requires me to:
Create a query that returns the number of comments made on each day for each post from the top 50 most commented on posts in the past year.
For example, this query below is giving me a non aggregated result set:
select cast(creationdate as date),
postid,
count(id)
from comments
where postid = 17654496
group by creationdate, postid
The schema is all here
https://data.stackexchange.com/stackoverflow/query/edit/898297
You can try to use CTE get the count by date.
then use window function with ROW_NUMBER make row number order by count amount desc.
;with CTE as (
select cast(creationdate as date) dt,
postid,
count(id) cnt
from comments
WHERE creationdate between dateadd(year,-1,getdate()) and getdate()
group by cast(creationdate as date), postid
), CTE2 AS (
select *,ROW_NUMBER() OVER (order by cnt desc) rn
from CTE
)
SELECT *
FROM CTE2
WHERE rn <=50
https://data.stackexchange.com/stackoverflow/query/898322/test

Order By not working on datetime 101 format

Create table #temp
(
OrderDate datetime
)
insert into #temp values ('01/21/2015'),('01/20/2014'),('11/12/2013')
select distinct convert(varchar(10),orderdate,101) as OrderDate from #temp
order by convert(varchar(10),orderdate,101) asc
The above query gives me the result like below:
OrderDate
01/20/2014
01/21/2015
11/12/2013
But I want the result like below:
OrderDate
11/12/2013
01/20/2014
01/21/2015
The above is just a sample on which I am trying to do sorting on format 101. In my actual query I need to use distinct keyword and also the columns will come dynamically in the select statement by using parameter.
I can't use group by in my actual query.
Please help.
UPDATE
Referring to your comments the only way I've managed to get the UNIQUE results with only one column orderdate converted to VARCHAR 101 representation while still sorting it according to DATETIME sort order, was using a little workaround with GROUP BY clause:
SELECT
CONVERT(VARCHAR(10), A.OrderDate, 101) as orderdate
FROM
#temp AS A
GROUP BY
CONVERT(VARCHAR(10), A.OrderDate, 101)
ORDER BY
MAX(A.OrderDate) ASC
MAX(A.OrderDate) should always give you the exactly equal value to the value of every group, so it shouldn't be an improper way - I've put a working example with repeats under the following link on SQL Fiddle.
Still maybe the previous two-columned solution would happen to occur helpful:
select distinct
convert(varchar(10),orderdate,101) as OrderDateConverted,
orderdate
from
#temp
order by
orderdate asc
The above query sorts your query results according to DATETIME datatype whereas order by convert(varchar(10),orderdate,101) caused the alphanumeric sort order.
You can use subQuery as follows to solve the issue.
SELECT t.OrderDate FROM (
SELECT distinct Convert(VARCHAR(10), orderdate, 101) AS OrderDate
from #temp ) t
order by cast(t.OrderDate AS DATETIME) asc

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Resources