MS Access: Find the most common user per month query - database

Fairly new to Access 2016 and I'm writing a query to obtain the most common user within the database per month.
So a record would be
Table1
ID Date1
1 2019-02-28
This is my code grouping the totals per month:
Month: Format([Date1],"mmmm")
TopUser: (SELECT TOP 1 [Table1]![ID] FROM [Table1] GROUP BY
[Table1]![ID] order by COUNT([Table1]![ID]) DESC)
Expectation:
Month TopUser
January 2
February 1
March 2
April 3
Result:
Month TopUser
January 2
February 2
March 2
April 2
So my code is returning back the most common user overall instead of for each month. I'm not sure if this is an access aspect that I'm misinterpreting or if its my queries.

Try filtering on the month:
Select
Format([Date1], "yyyymm") As YearMonth,
(Select Top 1 T.ID
From Table1 As T
Where Format(T.[Date1], "yyyymm") = Format(Table1.[Date1], "yyyymm")
Group By ID
Order By Count(T.ID) Desc) As TopID
From
Table1
Group By
Format([Date1], "yyyymm")

Related

How to find difference between dates and find first purchase in an eCommerce database

I am using Microsoft SQL Server Management Studio. I am trying to measure the customer retention rate of an eCommerce site.
For this, I need four values:
customer_id
order_purchase_timestamp
age_by_month
first_purchase
The values of age_by_month and first_purchase are not in my database. I want to calculate them.
In my database, I have customer_id and order_purchase_timestamp.
The first_purchase should be the earliest instance of order_purchase_timestamp. I only want the month and year.
The age_by_month should be the difference of months from first_purchase to order_purchase_timestamp.
I only want to measure the retention of the customer for each month so if two purchases are made in the same month it shouldn't be shown.
the dates are between 2016-10-01 to 2018-09-30. it should be ordered by order_purchase_timestamp
An example
customer_id
order_purchase_timestamp
1
2016-09-04
2
2016-09-05
3
2016-09-05
3
2016-09-15
1
2016-10-04
to
customer_id
first_purchase
age_by_month
order_purchase_timestamp
1
2016-09
0
2016-09-04
2
2016-09
0
2016-09-05
3
2016-09
0
2016-09-05
1
2016-09
1
2016-10-04
What I have done
SELECT
customer_id, order_purchase_timestamp
FROM
orders
WHERE
(order_purchase_timestamp BETWEEN '2016-10-01' AND '2016-12-31')
OR (order_purchase_timestamp BETWEEN '2017-01-01' AND '2017-03-31')
OR (order_purchase_timestamp BETWEEN '2017-04-01' AND '2017-06-30')
OR (order_purchase_timestamp BETWEEN '2017-07-01' AND '2017-09-30')
OR (order_purchase_timestamp BETWEEN '2017-10-01' AND '2017-12-31')
OR (order_purchase_timestamp BETWEEN '2018-01-01' AND '2018-03-31')
OR (order_purchase_timestamp BETWEEN '2018-04-01' AND '2018-06-30')
OR (order_purchase_timestamp BETWEEN '2018-07-01' AND '2018-09-30')
ORDER BY
order_purchase_timestamp
Originally I was going to do it by quarters but I want to do it in months now.
The following approach is designed to be relatively easy to understand. There are other ways (e.g., windowed functions) that may be marginally more efficient; but this makes it easy to maintain at your current SQL skill level.
Note that the SQL commands below build on one another (so the answer is at the end). To follow along, here is a db<>fiddle with the working.
It's based around a simple query (which we'll use as a sub-query) that finds the first order_purchase_timestamp for each customer.
SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
The next thing is DATEDIFF to find the difference between 2 dates.
Then, you can use the above as a subquery to get the first date onto each row - then find the date difference e.g.,
SELECT orders.customer_id,
orders.order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, orders.order_purchase_timestamp) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
Note - DATEDIFF has a 'gotcha' that gets most people but is good for you - when comparing months, it ignores the day component e.g., if finding the difference in months, there is 0 difference in months between 1 Jan and 31 Jan. On the other hand, there will be a difference on 1 month between 31 Jan and 1 Feb. However, I think this is actually what you want!
The above, however, repeats when a customer has multiple purchases within the month (it has one row per purchase). Instead, we can GROUP BY to group by the month it's in, then only take the first purchase for that month.
A 'direct' approach to this would be to group on YEAR(orders.order_purchase_timestamp) AND MONTH(orders.order_purchase_timestamp). However, I use a little trick below - using EOMONTH which finds the last day of the month. EOMONTH returns the same date for any date in that month; therefore, we can group by that.
Finally, you can add the WHERE expression and ORDER BY to get the results you asked for (between the two dates)
SELECT orders.customer_id,
MIN(orders.order_purchase_timestamp) AS order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders AS orders_ref
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
WHERE orders.order_purchase_timestamp BETWEEN '20161001' AND '20180930'
GROUP BY orders.customer_id, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)
ORDER BY order_purchase_timestamp;
Results - note they are different from yours because you wanted the earliest date to be 1/10/2016.
customer_id order_purchase_timestamp first_purchase_date age_by_month
1 2016-10-04 00:00:00.000 2016-09-04 00:00:00.000 1
Edit: Because someone else will do it like this otherwise!
You can do this with a single read-through that will potentially run a little faster. It is also a bit shorter - but harder to understand imo.
The below uses windows functions to calculate both the customer's earliest purchase, and the earliest purchase for each month (and uses DISTINCT rather than a GROUP BY). With that, it just does the DATEDIFF to calculate the difference.
WITH monthly_orders AS
(SELECT DISTINCT orders.customer_id,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id, EOMONTH(orders.order_purchase_timestamp)) AS order_purchase_timestamp,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id) AS first_purchase_date
FROM orders)
SELECT *, DATEDIFF(month, first_purchase_date, order_purchase_timestamp) AS age_by_month
FROM monthly_orders
WHERE order_purchase_timestamp BETWEEN '20161001' AND '20180930';
Note however this has one difference in the results. If you have 2 orders in a month, and your lowest date filter is between the to (e.g., orders on 15/10 and 20/10, and your minimum date is 16/10) then the row won't be included as the earliest purchase in the month is outside the filter range.
Also beware with both of these and what type of date or datetime field you are using - if you have datetimes rather than just dates, BETWEEN '20161001' AND '20180930' is not the same as >= '20161001' AND < '20181001'
Here is short query that achieves all you want (descriptions of methods used are inline):
declare #test table (
customer_id int,
order_purchase_timestamp date
)
-- some test data
insert into #test values
(1, '2016-09-04'),
(2, '2016-09-05'),
(3, '2016-09-05'),
(3, '2016-09-15'),
(1, '2016-10-04');
select
customer_id,
-- takes care of correct display of first_purchase
format(first_purchase, 'yyyy-MM') first_purchase,
-- used to get the difference in months
datediff(m, first_purchase, order_purchase_timestamp) age_by_month,
order_purchase_timestamp
from (
select
*,
-- window function used to find min value for given column within group
-- for each row
min(order_purchase_timestamp) over (partition by customer_id) first_purchase
from #test
) a

Prevent duplicate values using group by and count distinct simultaneously?

I have a simple table with years and customer id and now I want to group by year and count distinct customers for each year. This is straightforward and works fine, my issue is that I don't want customers in year 1 to repeat in year 2, I only want to see new customers for each year. So how do I do that?
I have tried using count distinct with group by and even not in but it doesn't seem to work, it always gives me repeating values
select count (distinct customer ID), Year
FROM customers
group by year
lets say I have 100 customers for years 2015 to 2019
now I have
Year No of Customers
2015 30
2016 35
2017 40
2018 30
2019 10
Total 145 which is 45 more than 100
What I want is
Year No of Customers
2015 30
2016 30
2017 20
2018 20
2019 10
Total 100
If you only want to count customers in the first year they appear, then use two levels of aggregation:
select min_year, count(*)
from (select customerid, min(year) as min_year
from customers c
group by customerid
) c
group by min_year
order by min_year;
To get the total, you can use grouping sets or rollup (not all databases support these. A typical method is:
select min_year, count(*)
from (select customerid, min(year) as min_year
from customers c
group by customerid
) c
group by min_year with rollup;
Perhaps something like this:
SELECT count (distinct c1.customerID), c1.Year
FROM customers c1
WHERE c1.customerID not in (
SELECT c2.customerID
FROM customers c2
WHERE c2.year < c1.year
)
GROUP BY year

Query Most Recent Records in MS Access Based on Date Provided in Form Field

Let me start by noting I have spent a few days searching through S.O. and have not been able to find a solution. I apologize in advance if the solution is very simple, but I am still learning and appreciate any help I can get.
I have a MS Access 2010 Database, and I am trying to create a set of queries to inform other forms and queries. There are two tables: Borrower Contact Info (BC_Info) and Basic Financial Indicators (BF_Indicators). Each month, I review and track key performance metrics of each borrower. I would like to create a query that supplies the most recent record based on a textbox input (Forms![Portfolio_Review Menu]!Text47).
Two considerations have separated this from other posts I have seen in the 'greatest-n-per-group' tag:
Not every borrower will have data for every month.
I need to be able to see back in time, i.e. if it is January 1, 2019 and I want to see the metrics as of July 31, 2017, I want to make
sure I am only seeing data from before July 31, 2017 but as close to
this date as possible.
Fields are as follows:
BC_Info
- BorrowerName
-PartnerID
BF_Indicators
-Fin_ID
-DateUpdated
The tables are connected by BorrowerName -- which is a unique naming convention used for the primary key of BC_Info.
What I currently have is:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT Fin_ID, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= Forms![Portfolio_Review Menu]!Text47 OR
Forms![Portfolio_Review Menu]!Text47 IS NULL)
GROUP BY Fin_ID
) AS Last_BF ON BFI.Fin_ID = Last_BF.Fin_ID AND
BFI.DateUpdated = Last_BF.MAX_DATE);
This gives me the fields I need, and will keep records out that are past the date given in the textbox, but will give all records from before the textbox input -- not just the most recent.
Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
MEHN-45543 19 18 9/30/2018
Expected Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
As mentioned, I am planning to use the results of this Query to generate further queries that use aggregated information from the Financial Indicators to determine portfolio quality at the time.
Please let me know if there is any other information I can provide. And again, thank you in advance.
Try joining BC_Info to a query that aggregates BF_Indicators on BorrowerName, not Fin_ID. Tested with literal date value:
SELECT BC_Info.*, MaxDate
FROM BC_Info
INNER JOIN
(SELECT BorrowerName, Max(DateUpdated) AS MaxDate
FROM BF_Indicators WHERE DateUpdated <=#12/31/2018# GROUP BY BorrowerName) AS Q1
ON BC_Info.BorrowerName=Q1.BorrowerName;
If you need to include Fin_ID in the results, then:
SELECT BC_Info.*, Fin_ID, DateUpdated FROM BC_Info
INNER JOIN
(SELECT * FROM BF_Indicators WHERE Fin_ID IN
(SELECT TOP 1 Fin_ID FROM BF_Indicators AS Dupe
WHERE Dupe.BorrowerName=BF_Indicators.BorrowerName AND DateUpdated<=#12/31/2018#
ORDER BY Dupe.DateUpdated DESC)
) AS Q1
ON BC_Info.BorrowerName = Q1.BorrowerName;
If you don't like TOP N, adjust your original query:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT BorrowerName, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= #12/31/2018#)
GROUP BY BorrowerName
) AS Last_BF ON BFI.BorrowerName = Last_BF.BorrowerName AND
BFI.DateUpdated = Last_BF.MAX_DATE);
And 1 more to think about:
SELECT BC_Info.PartnerID, BC_Info.BorrowerName, BF_Indicators.Fin_ID, BF_Indicators.DateUpdated
FROM BC_Info RIGHT JOIN BF_Indicators ON BC_Info.BorrowerName = BF_Indicators.BorrowerName
WHERE (((BF_Indicators.DateUpdated)=DMax("DateUpdated","BF_Indicators","BorrowerName='" & [BC_Info].[BorrowerName] & "' AND DateUpdated<=#12/31/2018#")));

Join two tables depending on date sequence

In SQL Server 2008, I want to join two tables depending on date sequence. More specifically, I need to left join Payments table to Profiles table by the following rules:
UserId has to be matched.
Every record in Payments matches the record in Profiles with the closest Profiles.CreationDate before Payments.PayDate.
For a simplified example,
Table Payments:
UserId PayDate Amount
1 2012 400
1 2010 500
2 2014 600
Table Profiles:
UserId CreationDate Address
1 2009 NY
1 2015 MD
2 2007 NJ
2 2013 MA
3 2008 TX
Desired Result:
UserId CreationDate PayDate Amount Address
1 2009 2010 500 NY
1 2009 2012 400 NY
2 2013 2014 600 MA
It's guaranteed that a user have at least 1 Profiles record before he pays. Another restriction is that I not authorized to write anything into the database.
I idea is first left join Payments with Profiles, then within the record group matching each (UserId, PayDate) tuple, sort it by CreationDate, then select the last record. But I don't know how to implement it in SQL language, or are there any better ways to do this merge?
Use Outer Apply to do this.
SELECT py.UserId,
CreationDate,
PayDate,
Amount,
Address
FROM Payments py
OUTER APPLY (SELECT TOP 1 *
FROM Profiles pr
WHERE py.UserId = pr.UserId
and PayDate> CreationDate
ORDER BY CreationDate desc) cs
SQLFIDDLE DEMO

T-SQL - Getting most recent date and most recent future date

Assume the table of records below
ID Name AppointmentDate
-- -------- ---------------
1 Bob 1/1/2010
1 Bob 5/1/2010
2 Henry 5/1/2010
2 Henry 8/1/2011
3 John 8/1/2011
3 John 12/1/2011
I want to retrieve the most recent appointment date by person. So I need a query that will give the following result set.
1 Bob 5/1/2010 (5/1/2010 is most recent)
2 Henry 8/1/2011 (8/1/2011 is most recent)
3 John 8/1/2011 (has 2 future dates but 8/1/2011 is most recent)
Thanks!
Assuming that where you say "most recent" you mean "closest", as in "stored date is the fewest days away from the current date and we don't care if it's before or after the current date", then this should do it (trivial debugging might be required):
SELECT ID, Name, AppointmentDate
from (select
ID
,Name
,AppointmentDate
,row_number() over (partition by ID order by abs(datediff(dd, AppointmentDate, getdate()))) Ranking
from MyTable) xx
where Ranking = 1
This usese the row_number() function from SQL 2005 and up. The subquery "orders" the data as per the specifications, and the main query picks the best fit.
Note also that:
The search is based on the current date
We're only calculating difference in days, time (hours, minutes, etc.) is ignored
If two days are equidistant (say, 2 before and 2 after), we pick one randomly
All of which could be adjusted based on your final requirements.
(Phillip beat me to the punch, and windowing functions are an excellent choice. Here's an alternative approach:)
Assuming I correctly understand your requirement as getting the date closest to the present date, whether in the past or future, consider this query:
SELECT t.Name, t.AppointmentDate
FROM
(
SELECT Name, AppointmentDate, ABS(DATEDIFF(d, GETDATE(), AppointmentDate)) AS Distance
FROM Table
) t
JOIN
(
SELECT Name, MIN(ABS(DATEDIFF(d, GETDATE(), AppointmentDate))) AS MinDistance
FROM Table
GROUP BY Name
) d ON t.Name = d.Name AND t.Distance = d.MinDistance

Resources