I am trying to do daily analysis on a particular set of data. The table looks like this:
custNo visitTime FirstVisit
1234 2013-01-31 20:15
1234 2013-01-31 22:30
1234 2013-02-15 02:30
1234 2013-02-15 06:30
1234 2013-02-15 11:30
1234 2013-02-15 21:30
I am trying to do some Daily analysis using custNo. As you can see above the customer number repeats itself. One day is 2013-01-31 1:00am to 2013-02-01 00:59am. I am trying to come up with a query for FirstVistTime. So for31st Jan, it should be 2013-01-31 20:15 and for 15th Feb it should be 2013-02-15 02:30.
So far I came up with this query:
select custNo, visitTime, FirstVisit=(select MIN(c.visitTime) FROM customer c where
(c.custNo=ct.custNo and c.visitTime >= '01/01/2013 01:00' and c.visitTime < '03/01/2013
01:00')
from customer ct
where visitTime >= '01/01/2013 01:00'
and visitTime < '03/01/2013 01:00'
The problem with this is - if the custNo repeats it takes all the rows into account and calculates the minimum date, which in the above case would be 2013-01-31 20:15. I tried to use min(visitTime)over(partition by custNo,visitTime). Well it is a subquery that returns two values.
try this:
select custNo, min(visitTime) from customer
group by custNo, CAST(visitTime AS date)
order by custNo
or:
select t1.custNo, t1.visitTime, t2.minVal
from customer t1 left join
(
select custno, min(visitTime) as minVal from customer
group by custno, CAST(visitTime AS date)
) t2 on t1.custNo = t2.custNo and CAST(t2.minVal as date) = CAST(t1.visitTime as date)
order by t1.custNo
Actually, you should just be able to group by customer and date, and select the minimum visit time for each customer and date:
SELECT custNo, MIN(visitTime)
FROM customer c
GROUP BY custNo, CONVERT(DATE, visitTime, 112)
ORDER BY custNo, MIN(visitTime)
Is this what you're looking for?
SELECT custno, min(visit)
FROM visitors
GROUP BY custno, CONVERT(DATE, VISIT)
ORDER BY custno, min(visit)
Here is a SQLFiddle
Related
I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp
I'm trying to get the second to the highest monthly sales for every year.
So far I'm getting the second highest monthly sales for the first year only.
WITH newtable AS
(
SELECT
MONTH(o.orderdate) AS 'MONTH',
YEAR(o.orderdate) AS 'YEAR',
SUM(od.qty*od.unitprice) AS monthSales
FROM Sales.Orders AS o
INNER JOIN Sales.OrderDetails AS od
ON o.orderid = od.orderid
GROUP BY YEAR(o.orderdate), MONTH(o.orderdate)
)
SELECT YEAR, MAX(monthSales) AS secondHighestMonthlySales
FROM newtable
WHERE monthSales < (SELECT MAX(monthSales) FROM newtable)
GROUP BY YEAR;
I need the second highest for every year.
Assuming you have the data correct in newtable, focus on the second query regarding what you want. This is pure SQL:
Test Data:
Year Sales
2010 500
2010 400
2010 600
2011 700
2011 800
2011 900
2012 400
2012 600
2012 500
Query to select the second highest:
select O.year, max(O.sales) as secondhighestsale from Orders O,
(select year, max(sales) as maxsale
from Orders
group by year) A
where O. year = A.year
and O.sales < A.maxsale
group by O.year
Output:
year secondhighestsale
2010 500
2011 800
2012 500
As an alternative, you can use the ROWNUMBER() function. In this case, I use a common table expression. We find the second highest number of total sales for each year. These years are the so-called partitions.
USE TEMPDB
CREATE TABLE #Sales (SalesYear INT, TotalAmount INT)
INSERT INTO #Sales VALUES (2016, 1100), (2016, 700), (2016, 950),
(2017, 660), (2017, 760), (2017, 460),
(2017, 141), (2018, 999), (2018, 499);
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY SalesYear ORDER BY TotalAmount DESC) AS RowNumb
FROM #Sales
)
SELECT SalesYear,
TotalAmount
FROM CTE WHERE RowNumb = 2
Table Structure:
Invoice
Invoice Payment
Current Query:
;WITH cte (clientid, invoiceid, paid, disc)
As
(
Select client_id clientId, vinvoice_Id invoiceId, sum(amount_received) paid, sum(discount) disc
From tbl_Vendor_Invoice_Payment
Group by vinvoice_id, client_id
)
Select I.date, I.total_price, Isnull(paid, 0) Paid, (Total_price - Isnull(paid,0) - Isnull(disc,0)) Balance
From tbl_Vendor_invoice I Left join cte On I.client_id = cte.clientId
And I.vinvoice_id = cte.invoiceid
order by vinvoice_id desc
Output:
But my requirement is to get month-wise result of last six months as below:
Month total_price Paid Balance
--------------------------------------------
October 800.00 750.00 50.00
September 200.00 100.00 100.00
August 350.00 350.00 0.00
.........
Can anyone please help me to get this ?
SELECT DATEADD(month,DATEDIFF(month,0,[vi.date]),0), SUM(vi.total_price) total_price, SUM(vip.amount_received) Paid, SUM(vip.balance) balance
FROM tbl_Vendor_Invoice vi
INNER JOIN tbl_Vendor_Invoice_Payment vip
on vi.vInvoice_Id = vip.vInvoice_Id
WHERE [vi.date] >= DATEADD(month, -6, DATEADD(month,DATEDIFF(month,0,[vi.date]),0))
GROUP BY DATEADD(month,DATEDIFF(month,0,[vi.date]),0)
This will group by the first day of the month, including the year. If you want just the month name, you can extract that using DATENAME(month, [date]).
I have found an answer:
;WITH cte (clientid, invoiceid, paid, disc)
As
(
Select client_id clientId, vinvoice_Id invoiceId, sum(amount_received) paid, sum(discount) disc
From tbl_Vendor_Invoice_Payment
Group by vinvoice_id, client_id
)
SELECT DATENAME(month, [date]) mMonth, SUM(Isnull(paid, 0)) Amount_Paid
FROM tbl_Vendor_Invoice vi
INNER JOIN cte vip on vi.vinvoice_Id = vip.invoiceid
WHERE [date] >= DATEADD(month, -6, DATEADD(month,DATEDIFF(month,0,getdate()),0))
GROUP BY DATENAME(month, [date]), MONTH([date])
ORDER BY MONTH([date])
EDIT: I am using SQL Server 2005
So here's a tricky one. For audit purposes, we need to make 3 attempts to contact a customer. We can make more than 3 attempts to go above and beyond, but audit purposes I need to retrieve the date of the third most recent attempt for each customer.
In most cases, you just need the most recent period, so you can do something like..
SELECT CustID,MAX(AttemptDate) FROM Attempts GROUP BY CustID
.. but that obviously won't work in this scenario.
Say I have a table of attempts that occur which are tied to a customer.
CustID AttemptDate
123 2014-01-02
123 2014-01-05
123 2014-01-06 * retrieve this one
123 2014-01-07
123 2014-01-10
555 2014-02-01
555 2014-02-03
555 2014-02-07 * retrieve this one
555 2014-02-12
555 2014-02-20
Output:
CustID AttemptDate
123 2014-01-06
555 2014-02-07
Any tips for pulling this off?
;WITH t AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY CustId ORDER BY AttemptDate DESC) AS nth_most_recent
FROM MyTable
)
SELECT *
FROM t
WHERE nth_most_recent = 3
The ROW_NUMBER ranking function is your friend here:
WITH cte (CustId, AttemptDate, AttemptNumber) AS (
SELECT
CustId,
AttemptDate,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY AttemptDate DESC) AS AttemptNumber
FROM Attempts
)
SELECT
CustId,
AttemptDate
FROM cte
WHERE AttemptNumber = 3
Alternatively, if the common table expression syntax is causing problems, you could use a subquery:
SELECT
CustId,
AttemptDate
FROM (
SELECT
CustId,
AttemptDate,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY AttemptDate DESC) AS AttemptNumber
FROM Attempts
) sq
WHERE AttemptNumber = 3
I have some data that looks like this:
id date
--------------------------------
123 2013-04-08 00:00:00.000
123 2013-04-07 00:00:00.000
123 2013-04-06 00:00:00.000
123 2013-04-04 00:00:00.000
123 2013-04-03 00:00:00.000
I need to return a count of the most recent consecutive date streak for a given ID, which in this case would be 3 for id 123. I have no idea if this can be done in SQL. Any suggestions?
The way to do this is to subtract a sequence of numbers and take the difference. This is a constant for a sequence of dates. Here is an example to get the length of all sequences for an id:
select id, grp, count(*) as NumInSequence, min(date), max(date)
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
To get the longest one, I would use row_number() again:
select t.*
from (select id, grp, count(*) as NumInSequence,
min(date) as mindate, max(date) as maxdate,
row_number() over (partition by id order by count(*) desc) as seqnum
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
) t
where seqnum = 1