I have a table like this,Consider i have around 5 million records.
Transaction id|Amount|CustomerId|date
1 | 100 | 20 |1/1/2012
2 | 230 | 30 |2/2/2012
3 | 320 | 20 |2/3/2012
etc...
How can i find total amount for last 5 transactions of each customer in each quarter in 2012?
Output: Quarter|Customerid|totalAmount
1 | 20 | 40000
1 | 30 | 300000
2 ...etc...
Please write an efficient method..
You should post the ddl.But You can try something like this.It should work..
with mycte as
(
select customerid,datepart(qq,dt) as qtr,amount,
row_number() over(partition by
datepart (qq,dt),customerid order by dt desc,transaction id desc) as rn
from table where dt >= '01/01/2012'
)
select qtr,customerid,sum(amount) as amt
from mycte
where rn <= 5
group by qtr,customerid
If you want someone else to write efficient queries for you.Then you have to do some hard work by providing the ddl,indexes etc and some sample data and what approaches you have used till now.
Related
I need to select or update records from badge-records that have a date difference of more than 30 days after the last visit. A select query to find them is ok, so I can update them.
Difficult to explain in detail but I'll try with an example:
(This is an access system where people scan a badge and the timestamp is recorded.)
I only need to know the records when a badge has entered the system more than 30 days after the previous scan, + the very first scan.
The example table is showing the records I need from the table (i need 5 records)
Only records of the same badge number must be compared and updated.
Is this possible using TSQL ?
Example:
+------------------+--------------+
| TimeStamp | Badge |
+------------------+--------------+
| 19-10-2022 10:18 | Badge1 | <--- **select** (more the 30 days after previous scan)
| 01-01-2022 12:18 | Badge1 | <--- ok (less then 30 days)
| 08-12-2021 13:23 | Badge1 | <--- ok (less then 30 days)
| 20-11-2021 11:18 | Badge1 | <--- ok (less then 30 days)
| 22-10-2021 13:18 | Badge1 | <--- **select** (more the 30 days after previous scan)
| 23-08-2020 14:18 | Badge1 | <--- **select** (first entrance)
| 01-01-2022 09:18 | Badge12 | <--- ok (less then 30 days)
| 02-12-2021 10:18 | Badge12 | <--- **select** (more the 30 days after previous scan)
| 29-10-2021 23:18 | Badge12 | <--- ok (less then 30 days)
| 25-10-2021 12:18 | Badge12 | <--- **select** (first entrance)
+------------------+---------+----+
use this fiddle to have the example db and my wrong answer https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c1528618004f0fe6bb6319e8e638abae
Help others help you. Post a script that contains DDL and sample data that can be used as the basis for writing code.
with cte as (
select *, ROW_NUMBER() over (partition by Badge order by Timestamp) as rno
from #x
)
select cte.*, prior.rno as prno, datediff(day, prior.TimeStamp, cte.Timestamp) as ddif
from cte
left join cte as prior on cte.badge = prior.badge and cte.rno - 1 = prior.rno
where cte.rno = 1 or datediff(day, prior.TimeStamp, cte.Timestamp) > 30
order by cte.Badge, cte.TimeStamp;
This should work but I have no way of testing on 2008. fiddle to demonstrate. Comment out the WHERE clause to see the all the rows and the columns that are computed for the query logic. This uses ROW_NUMBER to generate a sequence number and then simply self joins using that value to simulate LAG.
updated fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a24d23f54030d7aadd8f889819cd4512
;WITH Ordered AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY Badge ORDER BY CONVERT(DATETIME, [scandate] ,103) DESC) rn
FROM History
)
SELECT M.*, DATEDIFF(dd, p.[scandate],m.[scandate]) DaysGap
FROM Ordered M
LEFT JOIN Ordered P
ON M.rn = P.rn-1
AND M.Badge = P.Badge
WHERE P.rn IS NULL -- first entrance
OR DATEDIFF(dd, p.[scandate],m.[scandate]) > 30
I have an sql table with the below data:
Id department Amount
1 Accounting 10000
2 Catering 5000
3 Cleaning 5000
I want to return the data as below:
Id department Amount
1 Accounting 10000
1 50%
2 Catering 5000
2 25%
3 Cleaning 5000
3 25%
This implies every records return a second record just below it and display the percentage of the total amount. I have tried to use a PIVOT table but still I cannot position
the second row just below the first related one.
Has anyone ever done something similar I need just some guidelines.
create table #T(Id int, Dept varchar(10),Amount int)
insert into #T
values(1,'Accounting',10000),(2,'Catering',5000),(3,'Cleaning',5000)
declare #Totll float = (Select sum(Amount) from #T)
Select *
from #T
union
select Id,Convert(varchar(50), (Amount/#Totll)*100)+'%',0
from #T
order by Id,Amount desc
Use a CTE to calculate the total of the amounts.
Then use UNION ALL for your table and the query which calculates the percentages:
with cte as (select sum(amount) sumamount from tablename)
select id, department, amount
from tablename
union all
select id, concat(100 * amount / (select sumamount from cte), '%'), null
from tablename
order by id, amount desc
See the demo.
Results:
> id | department | amount
> -: | :--------- | -----:
> 1 | Accounting | 10000
> 1 | 50% | null
> 2 | Catering | 5000
> 2 | 25% | null
> 3 | Cleaning | 5000
> 3 | 25% | null
I can use a traditional subquery approach to count the occurrences in the last ten minutes. For example, this:
drop table if exists [dbo].[readings]
go
create table [dbo].[readings](
[server] [int] NOT NULL,
[sampled] [datetime] NOT NULL
)
go
insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go
-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go
How can I use a window function to obtain the same result ? I've tried this:
select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled
But the result is just the running count. Any system variable that refers to the current row pointer ? currentrow.sampled ?
This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes
CREATE TABLE #DateTimes(datetime datetime primary key);
WITH E1(N) AS
(
SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1)) V(N)
) -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
,R(StartRange, EndRange)
AS (SELECT MIN(sampled),
MAX(sampled)
FROM readings)
,N(N)
AS (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) AS N
FROM E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM N,
R;
And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
WITH T1 AS
( SELECT Server,
MIN(sampled) AS StartRange,
MAX(sampled) AS EndRange
FROM readings
GROUP BY Server )
SELECT Server,
sampled,
Cnt
FROM T1
CROSS APPLY
( SELECT r.sampled,
COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
FROM #DateTimes N
LEFT JOIN readings r
ON r.sampled = N.datetime
AND r.server = T1.server
WHERE N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE CA.sampled IS NOT NULL
ORDER BY sampled
The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.
As far as I know, there is not a simple exact replacement for your subquery using window functions.
Window functions operate on a set of rows and allow you to work with them based on partitions and order.
What you are trying to do isn't the type of partitioning that we can work with in window functions.
To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.
I would suggest cross apply() as an alternative to your subquery.
I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.
Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.
select
r.server
, r.sampled
, CrossApply = x.CountRecent
, OriginalSubquery = (
select count(*)
from readings s
where s.server=r.server
and s.sampled <= r.sampled
/* doesn't include 10 minutes ago */
and s.sampled > dateadd(minute,-10,r.sampled)
)
, Slices = count(*) over(
/* partition by server, 10 minute slices, not the same thing*/
partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
order by sampled
)
from readings r
cross apply (
select CountRecent=count(*)
from readings i
where i.server=r.server
/* changed to >= */
and i.sampled >= dateadd(minute,-10,r.sampled)
and i.sampled <= r.sampled
) as x
order by server,sampled
results: http://rextester.com/BMMF46402
+--------+---------------------+------------+------------------+--------+
| server | sampled | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
| 1 | 01.01.2017 08:00:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:02:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:05:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:30:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:31:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:37:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:40:00 | 4 | 3 | 1 |
| 1 | 01.01.2017 08:41:00 | 4 | 3 | 2 |
| 1 | 01.01.2017 09:07:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 09:08:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 09:09:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 09:11:00 | 4 | 4 | 1 |
+--------+---------------------+------------+------------------+--------+
Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:
select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)
We can already use expressions like this:
select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)
so thinking would be great if we could reference a column that's in the current row.
I have database with job numbers, scheduled date, and scheduled hours such as this:
J410 | 11/14/2016 | 50|
I have been asked to produce a report with one line for each day of the job like this:
J410 | 11/14/2016 | 10 |
J410 | 11/15/2016 | 10 |
J410 | 11/16/2016 | 10 |
J410 | 11/17/2016 | 10 |
J410 | 11/18/2016 | 10 |
The logic is that we assume 10 hour days, so the total number of hours divided by 10 = the number of days, then the users want a line for each day.
I can easily get the number of days like this:
SELECT CEILING(Hours / 10.0) - Note that some hours don't divide evenly by 10 so I am rounding up.
I don't have the slightest idea how to attack the problem of creating (for reporting only) additional lines for each date.
My initial thoughts are to select the records into a temp table and then select each record and use a WHILE statement to duplicate the records until the number of days have been reached.
Can anyone provide a better idea ?
If it helps
Declare #YourTable table (JobNumber varchar(25),Date date,Hours int)
Insert Into #YourTable values
('J410','11/14/2016',50)
;with cte0(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
,cteN(N) As (Select Row_Number() over (Order By (Select NULL)) From cte0 N1, cte0 N2, cte0 N3)
Select A.JobNumber
,Date = DateAdd(DD,N-1,Date)
,Hours = cast(Hours/CEILING(Hours/10.0) as decimal(10,2))
From #YourTable A
Join cteN B on N<=CEILING(Hours/10.0)
Returns
JobNumber Date Hours
J410 2016-11-14 10.00
J410 2016-11-15 10.00
J410 2016-11-16 10.00
J410 2016-11-17 10.00
J410 2016-11-18 10.00
Use a Numbers Table and add a day to your existing table until the date limit is reached...
I have the following table in my database:
Month|Year | Value
1 |2013 | 100
4 |2013 | 101
8 |2013 | 102
2 |2014 | 103
4 |2014 | 104
How can I fill in "missing" rows from the data, so that if I query from 2013-03 through 2014-03, I would get:
Month|Year | Value
3 |2013 | 100
4 |2013 | 101
5 |2013 | 101
6 |2013 | 101
7 |2013 | 101
8 |2013 | 102
9 |2013 | 102
10 |2013 | 102
11 |2013 | 102
12 |2013 | 102
1 |2014 | 102
2 |2014 | 103
3 |2014 | 103
As you can see I want to repeat the previous Value for a missing row.
I have created a SQL Fiddle of this solution for you to play with.
Essentially it creates a Work Table #Months and then Cross joins this will all years in your data set. This produces a complete list of all months for all years. I then left join the Test data provided in your example (Table named TEST - see SQL fiddle for schema) back into this list to give me a complete list with Values for the months that have them. The next issue to overcome was using the last months values if this months didn't have any. For that, I used a correlated sub-query i.e. joined tblValues back on itself only where it matched the maximum Rank of a row which has a value. This then gives a complete result set!
If you want to filter by year\month you can add this into a WHERE clause just before the final Order By.
Enjoy!
Test Schema
CREATE TABLE TEST( Month tinyint, Year int, Value int)
INSERT INTO TEST(Month, Year, Value)
VALUES
(1,2013,100),
(4,2013,101),
(8,2013,102),
(2,2014,103),
(4,2014,104)
Query
DECLARE #Months Table(Month tinyint)
Insert into #Months(Month)Values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
With tblValues as (
select Rank() Over (ORDER BY y.Year, m.Month) as [Rank],
m.Month,
y.Year,
t.Value
from #Months m
CROSS JOIN ( Select Distinct Year from Test ) y
LEFT JOIN Test t on t.Month = m.Month and t.Year = y.Year
)
Select t.Month, t.Year, COALESCE(t.Value, t1.Value) as Value
from tblValues t
left join tblValues t1 on t1.Rank = (
Select Max(tmax.Rank)
From tblValues tmax
Where tmax.Rank < t.Rank AND tmax.Value is not null)
Order by t.Year, t.Month