SQL Server, select with group by and date column - sql-server

I have tbl_Orders:
OrderNumber ProductCode Qty OrderDate SentDate
---------------------------------------------------------------
1000 A 100 2018-03-01 00:30:51 2018-03-02
1000 A 150 2018-03-12 04:31:54 2018-03-13
1000 B 150 2018-03-11 03:34:51 2018-03-14
1001 C 200 2018-03-01 00:30:51 2018-04-02
1002 D 200 2018-03-01 00:30:51 2018-04-15
I want to write a query to get result like:
OrderNumber Qty MAXOrderDate MAXSentDate
------------------------------------------------
1000 400 2018-03-12 04:31:54 2018-03-14
1001 200 2018-03-01 00:30:51 2018-04-02
1002 200 2018-03-01 00:30:51 2018-04-15
From my newbie perspective it looks like I need 'group by OrderNumber'. But I am not sure what else. And the second problematic thing is "MAXOrderDate" or "MAXSentDate". I also don't know how to select max value from Date.
I really appreciate any help.

Simple,
It should be - As long as you are aggregating data you need not group by that column (eg.. min, max, avg etc.,.. so your original idea of groupby ordernumber should be sufficient)
Try the sql
select
tbl_Orders.ordernumber as OrderNumber
, sum(tbl_Orders.qty) as qty
,max(tbl_Orders.orderdate) as orderdate
,max(tbl_Orders.sentdate) as sentdate
from tbl_Orders
group by
tbl_Orders.ordernumber

You have to sum Qty, max on MAXOrderDate and max on MAXSentDate and then group by OrderNumber, like:
SELECT OrderNumber, sum(Qty) As Qty, max(MAXOrderDate) As OrderDate, max(MAXSentDate) AS SentDate
FROM tbl_Orders (NOLOCK)
GROUP BY OrderNumber
I would use NOLOCK to avoid other queries to be blocked in hight performance environments. Be aware that NOLOCK could bring in the result records that are in transactions that are not yet commited.

Related

Error on Group by method

I wrote a query to combine records in multiple tables. Tables named by Purchase Order, Purchase Order Item
[ Note: The column names are not original names, it just for a model data]
In purchase order table have the order details like this,
id date vendorid totalitems totalqty grossamnt netamnt taxamt
----------------------------------------------------------------------------
1 03/10/17 00001 2 6 12000 13000 1000
Purchase Order Item table have the order details like this,
poid id productcode qty rate tax(%) taxamnt total
--------------------------------------------------------
1 1 12001 3 6000 2.5 500 6500
2 1 12000 3 6000 2.5 500 6500
My Query is,
select po.POID,po.SupplierId,po.TotalItems from
PurchaseOrder po, PurchaseOrderItem poi where po.POID=poi.POID group by
po.POID, po.SupplierId,po.TotalItems
Query returns,
id vendorid totalitems
--------------------------
1 00001 2
1 00001 2
Expected Output is,
id vendorid totalitems
------------------------
1 00001 2
You are using an outdated join method, have a read here:
ANSI vs. non-ANSI SQL JOIN syntax
You are also joining to another table, but never use it:
select po.POID,po.SupplierId,po.TotalItems
from PurchaseOrder po, PurchaseOrderItem poi
where po.POID=poi.POID
group by po.POID, po.SupplierId,po.TotalItems
Can just be:
select po.POID,po.SupplierId,po.TotalItems
from PurchaseOrder po
group by po.POID, po.SupplierId,po.TotalItem
OR
select DISTINCT
po.POID,
po.SupplierId,
po.TotalItems
from PurchaseOrder po

Finding max date difference on a single column

in the below table example - Table A, we have entries for four different ID's 1,2,3,4 with the respective status and its time. I wanted to find the "ID" which took the maximum amount of time to change the "Status" from Started to Completed. In the below example it is ID = 4. I wanted to run a query and find the results, where we currently has approximately million records in a table. It would be really great, if someone provide an effective way to retrieve this data.
Table A
ID Status Date(YYYY-DD-MM HH:MM:SS)
1. Started 2017-01-01 01:00:00
1. Completed 2017-01-01 02:00:00
2. Started 2017-10-02 03:00:00
2. Completed 2017-10-02 05:00:00
3. Started 2017-15-03 06:00:00
3. Completed 2017-15-03 09:00:00
4. Started 2017-22-04 10:00:00
4. Completed 2017-22-04 15:00:00
Thanks!
Bruce
You can query as below:
Select top 1 with ties Id from #yourDate y1
join #yourDate y2
On y1.Id = y2.Id
and y1.[STatus] = 'Started'
and y2.[STatus] = 'Completed'
order by Row_number() over(order by datediff(mi,y1.[Date], y2.[date]) desc)
SELECT
started.ID, timediff(completed.date, started.date) as elapsed_time
FROM TABLE_A as started
INNER JOIN TABLE_A as completed ON (completed.ID=started.ID AND completed.status='Completed')
WHERE started.status='Started'
ORDER BY elapsed_time desc
be sure there's a index on TABLE_A for the columns ID, date
I haven't run this sql but it may solve your problem.
select a.id, max(DATEDIFF(SECOND, a.date, b.date + 1)) from TableA as a
join TableA as b on a.id = b.id
where a.status="started" and b.status="completed"
Here's a way with a correlated sub-query. Just uncomment the TOP 1 to get ID 4 in this case. This is based off your comments that there is only 1 "started" record, but could be multiple "completed" records for each ID.
declare #TableA table (ID int, Status varchar(64), Date datetime)
insert into #TableA
values
(1,'Started','2017-01-01 01:00:00'),
(1,'Completed','2017-01-01 02:00:00'),
(2,'Started','2017-02-10 03:00:00'),
(2,'Completed','2017-02-10 05:00:00'),
(3,'Started','2017-03-15 06:00:00'),
(3,'Completed','2017-03-15 09:00:00'),
(4,'Started','2017-04-22 10:00:00'),
(4,'Completed','2017-04-22 15:00:00')
select --top 1
s.ID
,datediff(minute,s.Date,e.EndDate) as TimeDifference
from #TableA s
inner join(
select
ID
,max(Date) as EndDate
from #TableA
where Status = 'Completed'
group by ID) e on e.ID = s.ID
where
s.Status = 'Started'
order by
datediff(minute,s.Date,e.EndDate) desc
RETURNS
+----+----------------+
| ID | TimeDifference |
+----+----------------+
| 4 | 300 |
| 3 | 180 |
| 2 | 120 |
| 1 | 60 |
+----+----------------+
If you know that 'started' will always be the earliest point in time for each ID and the last 'completed' record you are considering will always be the latest point in time for each ID, the following should have good performance for a large number of records:
SELECT TOP 1
id
, DATEDIFF(s, MIN([Date]), MAX([date])) AS Elapsed
FROM #TableA
GROUP BY ID
ORDER BY DATEDIFF(s, MIN([Date]), MAX([date])) DESC

sql server 2012 merge values from two tables

I have two tables
tblA(sn, ID int pk, name varchar(50), amountA decimal(18,2))
and
tblB(ID int fk, amountB decimal(18,2))
here: tblA occures only once and tblB may occure multiple time
I need the query to display data like:
sn ID name AmountA amountB Balance
1 1001 abc 5000.00 5000.00
2 1002 xyz 10000.00
1002 4000.00 6000.00 (AmountA-AmountB)
3 1003 pqr 15000.00
1003 4000.00
1003 3000.00
1003 2000.00 6000.00 (AmountA-sum(AmountB))
Please ask if any confusion
I tried using lag and lead function but I couldnot get the desire result, Please help.
Since you are using SQL Server 2012, you can use a partition with an aggregate function (SUM):
SELECT t.sn,
t.ID,
t.name,
t.credits AS AmountA,
t.debits AS amountB,
SUM(t.credits - t.debits) OVER (PARTITION BY t.ID ORDER BY t.debits, t.credits) AS Balance
FROM
(
SELECT sn,
ID,
name,
AmountA AS credits,
0 AS debits
FROM tblA
UNION ALL
SELECT 0 AS sn,
ID,
NULL AS name,
0 AS credits,
amountB AS debits
FROM tblB
) t
ORDER BY t.ID,
t.debits,
t.credits
Explanation:
Since the records in tables A and B each represent a single transaction (i.e. a credit or debit), using a UNION query to bring both sets of data into a single table works well here. After this, I compute a rolling sum using the difference between credit and debit, for each record, for each ID partition group. The ordering is chosen such that credits appear at the top of each partition while debits appear on the bottom.

How to subtact a Sum Amount from my table a from a value in my table b

tableA:
tranactionid staffid amount
1001 19052 2000
1002 19043 3000
1004 19076 4000
tableB:
BudgetCode Budget
271098 20000
I want to sum Amount in tableA and subtract it from Budget value 20000
20000 - 9000 = 11000
This will work if TableB has just one row forever, but it's likely your requirements go beyond that:
SELECT tableB.Budget - (SELECT SUM(amount) FROM tableA)
FROM tableB.Budget;
You have to use:
SELECT (SELECT [Budget] FROM tableB) - (SELECT SUM([amount]) FROM tableA) AS [any_name]

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities.
I have a database view that unions data from all of these tables giving us the ActivityView that looks like this.
As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.
+----------------------+---------------+---------------------+-------------------+
| activity_client_id | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
| 112 | Interview | 2015-06-01 09:00 | 2015-06-01 11:00 |
| 112 | CV updating | 2015-06-01 09:30 | 2015-06-01 11:30 |
| 112 | Course | 2015-06-02 09:00 | 2015-06-02 16:00 |
| 112 | Interview | 2015-06-03 09:00 | 2015-06-03 10:00 |
+----------------------+---------------+---------------------+-------------------+
Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:
+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
| 112 | 2015-05-20 |
+-----------+---------------------+
I need to create a report that will show the following columns:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible.
So this report will tell us how long from sign up does it take each client to achieve this figure.
What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:
In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30.
For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.
So the report for our sample client would give the following result:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
| 112 | 2015-05-20 | 2015-06-02 |
+-----------+---------------------+--------------------------------------------+
So my question is how can I create the report using a select statement ?
I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table.
But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.
I am using SQL Server 2005.
See SQL Fiddle here.
with tbl as (
-- this will generate daily merged ovelaping time
select distinct
a.id
,(
select min(x.starttime)
from act x
where x.id=a.id and ( x.starttime between a.starttime and a.endtime
or a.starttime between x.starttime and x.endtime )
) start1
,(
select max(x.endtime)
from act x
where x.id=a.id and ( x.endtime between a.starttime and a.endtime
or a.endtime between x.starttime and x.endtime )
) end1
from act a
), tbl2 as
(
-- this will add minute and total minute column
select
*
,datediff(mi,t.start1,t.end1) mi
,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
from tbl t
), tbl3 as
(
-- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
select
t.id
,min(t.start1) starttime
,min(case when t.totalmi>300 then t.end1 else null end) endtime
from tbl2 t
group by t.id
)
-- final result
select *
from tbl3
where endtime is not null
This is one way to do it:
;WITH CTErn AS (
SELECT activity_client_id, activity_type,
activity_start_date, activity_end_date,
ROW_NUMBER() OVER (PARTITION BY activity_client_id
ORDER BY activity_start_date) AS rn
FROM activities
),
CTEdiff AS (
SELECT c1.activity_client_id, c1.activity_type,
x.activity_start_date, c1.activity_end_date,
DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id
ORDER BY x.activity_start_date) AS seq
FROM CTErn AS c1
LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
CROSS APPLY (SELECT CASE
WHEN c1.activity_start_date < c2.activity_end_date
THEN c2.activity_end_date
ELSE c1.activity_start_date
END) x(activity_start_date)
)
SELECT TOP 1 client_id, client_sign_up_date, activity_start_date,
hoursOfActivicty
FROM CTEdiff AS c1
INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id
CROSS APPLY (SELECT SUM(diff) / 60.0
FROM CTEdiff AS c3
WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
WHERE hoursOfActivicty >= 5
ORDER BY seq
Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.
Demo here
The first CTE, i.e. CTErn, produces the following output:
client_id activity_type start_date end_date rn
112 Interview 2015-06-01 09:00 2015-06-01 11:00 1
112 CV updating 2015-06-01 09:30 2015-06-01 11:30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 4
The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:
client_id activity_type start_date end_date diff seq
112 Interview 2015-06-01 09:00 2015-06-01 11:00 120 1
112 CV updating 2015-06-01 11:00 2015-06-01 11:30 30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 420 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 60 4
The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.
The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.
A Geometric Approach
For another issue, I've taken a geometric approach to date
packing. Namely, I convert dates and times to a sql geometry
type and utilize geometry::UnionAggregate to merge the ranges.
I don't believe this will work in sql-server 2005. But your
problem was such an interesting puzzle that I wanted to see
whether the geometrical approach would work. So any future
users running into this problem that have access to a later
version can consider it.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In 'redate':
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations (This is usually the end goal, but you need more).
I calculate the difference in minutes between activity start and
end dates (I do this first in seconds then divide by 60 for the
sake of a precision issue).
I calculate the cumulative sume of these minutes for each row.
In the outer query:
I align the previous cumulative minutes sum with each current row
I filter for the row where the 5hr goal was met but where the
previous minutes shows that the 5hr goal for the previous row
was not met.
I then calculate where in the current row's range the user has
met the 5 hours, to not only arrive at the date the five hour
goal was met, but the exact time.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #activities -- where I put your data
),
mergeLines as (
select activity_client_id,
lines = geometry::UnionAggregate(line)
from #activities
cross apply (select
startP = geometry::Point(convert(float,activity_start_date), 0, 0),
stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by activity_client_id
),
redate as (
select client_id = activity_client_id,
activities_start_date,
activities_end_date,
minutes,
rollingMinutes = sum(minutes) over(
partition by activity_client_id
order by activities_start_date
rows between unbounded preceding and current row
)
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
activities_start_date = convert(datetime, l.line.STPointN(1).STX),
activities_end_date = convert(datetime, l.line.STPointN(3).STX)
) unprepare
cross apply (select minutes =
round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
) duration
)
select client_id,
activities_start_date,
activities_end_date,
met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date)
from (
select *,
prevRoll = lag(rollingMinutes) over (
partition by client_id
order by rollingMinutes
)
from redate
) ranker
where rollingMinutes >= 60 * 5
and prevRoll < 60 * 5;

Resources