Get count of latest consecutive daily logins - sql-server

I have a SQL table containing a list of the daily logins of the subscribers to my site. The rows contain the user id and the date and time of the first login of each day, which means there is a maximum of one record per day for each member.
Is there a way to use SQL to get a count of the number if consecutive daily logins for each member, that is the latest login streak?
I could do this programmatically (C#) by going through each record for a user in reverse order and stop counting when a day is missing, but I was looking for a more elegant way to do this through a SQL function. Is this at all possible?
Thanks!

Answer from comment
You can use Lag function https://msdn.microsoft.com/en-IN/library/hh231256.aspx
If your database compatibility level is lower than 110 you cant use Lag function
The following code must get the latest streak of logins for you (only when there is record for 1st login of the day)
suppose if your table of dates for a single user is
pk_id dates
----------- -----------
27 2017-04-02
28 2017-04-03
29 2017-04-04
30 2017-04-05
31 2017-04-06
44 2017-04-09
45 2017-04-10
46 2017-04-11
47 2017-04-12
48 2017-04-13
then
SELECT ROW_NUMBER() OVER(ORDER BY dates desc) AS Row#,dates into #temp1 FROM
yourTable where userid = #userid
select top 1 a.Row# As LatestStreak from #temp1 a inner join #temp1 b on a.Row# = b.Row#-1
where a.dates <> DATEADD(DAY,1,b.dates) order by a.Row# asc
this gives you 5, I have used Inner Join so that it wont have server compatibility issue
or you can use this, you can get the dates in the last streak too if you use c.dates instead of count(*)
SELECT COUNT(*)
FROM
(SELECT MAX (a.dates) AS Latest
FROM #yourtable a
WHERE DATEADD(DAY,-1,dates)
NOT IN (SELECT dates FROM #yourtable)) AS B
JOIN #yourtable c
ON c.dates >= b.Latest

This solution is probably similar to what you want:
How do I group on continuous ranges
It has a link to the motivating explanation here:
Things SQL needs: SERIES()
The main idea is that if after you have grouped by individual id's, and ordered by dates, the difference between date and current row is an invariant within each series of consecutive dates. So you group by user and this invariant (and min date within this group). And then you can group by user and pick the count of the 2nd column and only pick the max count.

Related

Get overlapping dates between two date ranges (in columns) - WITHOUT creating database objects

I have a tricky situation in Microsoft SQL Server 2016, in which I need to get a list of dates that an employee was in Leave of Absence (LOA) in a PayPeriod, with fixed PeriodStart and fixed PeriodEnd columns.
See the figure below (the source dataset):
I have 4 employees in 5 rows of a dataset.
PeriodStart and PeriodEnd are fixed always, with the values Dec 15 and Dec 22 respectively (for 2020). I have each employee's LOA Start Date and LOA End Date in separate columns. The source dataset will have only one set of PeriodStart and PeriodEnd dates at any given time. Say, in the above case, it is ALWAYS Dec 15 and Dec 22. In someother cases, it will be Dec 22 and Dec 29. But only one range at a given time. The source dataset cannot contain Dec 15 - Dec 22 for Employee X, and Dec 22 - Dec 29 for Employee Y.
The desired output is as below:
The challenge here is, I am using our client's Query Builder, which cannot use T-SQL objects such as Temp tables (#), Table Variables (#), Common Table Expressions (CTE), User Defined Functions or even Views.
This is purely ad-hoc reporting, where you can ONLY create derived tables (or subqueries) and have an alias name and use it as a dataset. Such a dataset can be used in JOINs, and other regular stuff.
For example:
SELECT a, b
FROM
(SELECT t1.a, t2.b
FROM table1 t1
INNER JOIN table2 t2
ON t1.ID = t2.ID) XYZ
The derived table (or sub query) XYZ is the main dataset for me.
I need my desired output to be aliased XYZ.
Can anyone help me achieve this?
SELECT
yourTable.EmployeeID,
DATEADD(DAY, calendar.date_id, yourTable.PeriodStart)
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY the_primary_key) - 1 AS date_id
FROM
any_big_enough_table
)
AS calendar
INNER JOIN
yourTable
ON calendar.date_id <= DATEDIFF(DAY, yourTable.PeriodStart, yourTable.PeriodEnd)
AND calendar.date_id >= DATEDIFF(DAY, yourTable.PeriodStart, yourTable.LOAStartDate)
AND calendar.date_id <= DATEDIFF(DAY, yourTable.PeriodStart, yourTable.LOAEndDate)
Please excuse typos, I'm on my phone.

trying to break down results of SQL query to show data for each month

I'm very new to SQL and have a problem I can't figure out.
I'm trying to replace an excel spreadsheet and turn it into a PowerBi report. Currently our team runs the following query to get the amount of active users every month and types it into an excel sheet which then graphs the number of users each month showing the increase. Since I don't want to manually input data each month my goal is to break down this query to give the current number of users in each month and add to that every month.
Desired result would look something like this
dateCreated # of Users
----------------------
2008-10 295
2008-11 355
2008-12 470
2009-01 522
I was able to break it down enough to give me the amount created each month, but that doesn't give me the total amount each month. This is the query that I used and a sample of the results I got.
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') AS 'dateCreated',
COUNT(s.UserId) AS "# of Users"
FROM
ER.dbo.ssUser s,
ER.dbo.ssUserAddress USERADDR,
ER.dbo.ssAddress ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
GROUP BY
FORMAT(USERADDR.DateCreated, 'yyyy-MM')
result sample:
dateCreated # of Users
2008-10 295
2008-11 41
2008-12 22
2009-01 19
This is almost there, but I need a running total. I've tried a lot of different things including SUM, SUM OVER, COUNT OVER etc. My boss suggested a while loop. I can't get that to work either and everything I've read says that should be the last resort. Here is one example of my failed attempts
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') as 'dateCreated',
COUNT(s.UserId)
OVER(
PARTITION BY Month(USERADDR.DateCreated)
GROUP BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')
)
AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
--original query which gives total number of users right now.
SELECT
count(s.UserId) AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
You can do a window sum() on the aggregated count of users per month, like so:
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') [dateCreated],
SUM(COUNT(s.UserId)) OVER(ORDER BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')) [# of Users]
FROM
ER.dbo.ssUser s
INNER JOIN ER.dbo.ssUserAddress USERADDR
ON s.UserId = USERADDR.UserId,
INNER JOIN ER.dbo.ssAddress ADDRESS
ON USERADDR.AddressId = ADDRESS.AddressId
WHERE Isdefault = 1 AND Type = 'soldto'
group by FORMAT(USERADDR.DateCreated, 'yyyy-MM')
Notes:
always prefer proper, explicit join syntax (with the ON keyword) over implicit, old-school joins, who were deprecated long time ago - I modified your query accordingly
SQLServer uses square brackets for identifiers - you should avoid single quotes, as they are generally used for litteral strings
you have unqualified column names in the WHERE clause: always qualify column names in your query, so it is easy to understand to which table they belong

Count by days, with all days

I need to count records by days, even if in the day were no records.
Count by days, sure, easy.
But how i can make it to print information, that 'in day 2018-01-10 was 0 records)
Should I use connect by level? Please, any help would be good. Can't use plsql, just oracle sql
First you generate every date that you want in an inline view. I chose every date for the current year because you didn't specify. Then you left outer join on date using whichever date field you have in that table. If you count on a non-null field from the source table then it will count 0 rows on days where there is no join.
select Dates.r, count(tablename.id)
from (select trunc(sysdate,'YYYY') + level - 1 R
from dual
connect by level <= trunc(add_months(sysdate,12),'YYYY') - trunc(sysdate,'YYYY')) Dates
left join tablename
on trunc(tablename.datefield) = Dates.r
group by Dates.r

PostgreSQL - Filter column 2 results based on column 1

Forgive a novice question. I am new to postgresql.
I have a database full of transactional information. My goal is to iterate through each day since the first transaction, and show how many unique users made a purchase on that day, or in the 30 days previous to that day.
So the # of unique users on 02/01/2016 should show all unique users from 01/01/2016 through 02/01/2016. The # of unique users on 02/02/2016 should show all unique users from 01/02/2016 through 02/02/2016.
Here is a fiddle with some sample data: http://sqlfiddle.com/#!15/b3d90/1
The result should be something like this:
December 17 2014 -- 1
December 18 2014 -- 2
December 19 2014 -- 3
...
January 13 2015 -- 16
January 19 2015 -- 15
January 20 2015 -- 15
...
The best I've come up with is the following:
SELECT
to_char(S.created, 'YYYY-MM-DD') AS my_day,
COUNT(DISTINCT
CASE
WHEN S.created > S.created - INTERVAL '30 days'
THEN S.user_id
END)
FROM
transactions S
GROUP BY my_day
ORDER BY my_day;
As you can see, I have no idea how I could reference what exists in column one in order to specify what date range should be included in the filter.
Any help would be much appreciated!
I think if you do a self-join, it would give you the results you seek:
select
t1.created,
count (distinct t2.user_id)
from
transactions t1
join transactions t2 on
t2.created between t1.created - interval '30 days' and t1.created
group by
t1.created
order by
t1.created
That said, I think this is going to do form of a cartesian join in the background, so for large datasets I doubt it's very efficient. If you run into huge performance problems, there are ways to make this a lot faster... but before you address that, find out if you need to.
-- EDIT 8/20/16 --
In response to your issue with the performance of this... yes, it's a pig. I admit it. I encountered a similar issue here:
PostgreSQL Joining Between Two Values
The same concept for your example is this:
with xtrans as (
select created, created + generate_series(0, 30) as create_range, user_id
from transactions
)
select
t1.created,
count (distinct t2.user_id)
from
transactions t1
join xtrans t2 on
t2.create_range = t1.created
group by
t1.created
order by
t1.created
It's not as easy to follow, but it should yield identical results, only it will be significantly faster because it's not doing the "glorified cross join."

SQL Server Stored Procedure get nearest available date to parameter

I have a table of database size information. The data is collected daily. However, some days are missed due to various reasons. Additionally we have databases which come and go over or the size does not get recorded for several databases for a day or two. This all leads to very inconsistent data collection regarding dates. I want to construct a SQL procedure which will generate a percentage of change between any two dates (1 week, monthly, quarterly, etc.) for ALL databases The problem is what to do if a chosen date is missing (no rows for that date or no row for one or more databases for that date). What I want to be able to do is get the nearest available date for each database for the two dates (begin and end).
For instance, if database Mydb has these recording dates:
2015-05-03
2015-05-04
2015-05-05
2015-05-08
2015-05-09
2015-05-10
2015-05-11
2015-05-12
2015-05-14
and I want to compare 2015-05-06 with 2015-05-14
The 2015-05-07 date is missing so I would want to use the next available date which is 2015-05-08. Keep in mind, MyOtherDB may only be missing the 2015-05-06 date but have available the 2015-05-07 date. So, for MyOtherDb I would be using 2015-05-07 for my comparison.
Is there a way to proceduralize this with SQL WITHOUT using a CURSOR?
You're thinking too much into this, simple do a "BETWEEN" function in your where clause that takes the two parameters.
In your example, if you perform the query:
SELECT * FROM DATABASE_AUDIT WHERE DATE BETWEEN param1 /*2015-05-06*/ and param2 /*2015-05-14*/
It will give you the desired results.
select (b.dbsize - a.dbsize ) / a.dbsize *100 dbSizecChangePercent from
( select top 1 * from dbAudit where auditDate = (select min(auditDate) from dbAudit where auditDate between '01/01/2015' and '01/07/2015')) a
cross join
(select top 1 * from dbAudit where auditDate = (select max(auditDate) from dbAudit where auditDate between '01/01/2015' and '01/07/2015')) b
The top 1 can be replaced by a group by. This was assuming only 1 db aduit per day

Resources