Date difference between events in same column - sql-server

I am new to MS SQL and I am having difficulty in getting correct results. I currently have a table with headers updateddate(date), addedby(number), operationid(number) and servicereqno(number) that have been selected from three different tables.
The operationid is an event and the servicereqno is a job number. I want to be able to get a time difference taken between events and then once this is established calculate the average.
For example:
opid 512 is at 20:15 - servicereqno 1
opid 535 is at 21:23 - servicereqno 1
My first task will be to determine what the difference in time is, ensuring it is within the same job number.
Many thanks

In order to get the timespan between events, you need to number all operations sequentially (one increments without leftout), and then join that on itself with an offset of 1. You'll get n-1 rows as result with the timespan inbetween.
Something like this:
WITH cteOps AS (
SELECT ROW_NUMBER() OVER (PARTITION BY servicereqno ORDER BY updateddate) seqid, updateddate, servicereqno
FROM yourdatasource
)
SELECT DATEDIFF(millisecond, o1.updateddate, o2.updateddate) updateddatediff, servicereqno
FROM cteOps o1
JOIN cteOps o2 ON o1.seqid=o2.seqid+1 AND o1.servicereqno=o2.servicereqno;
Of course you can perform aggregates on that to get the average or whatever you need.

SQL Server provides a built-in DATEDIFF function, which you can use to find the difference between two dates. Furthermore, SQL Server has a built-in AVG function that you can use to calculate the average of a set of values..

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.
If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here
You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1
If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

MS Access : Average and Total Calculation in Single Query

INTRODUCTION TO DATABASE TABLE BEING USED -
I am working on a “Stock Market Prices” based Database Table. My table has got the data for the following FIELDS –
ID
SYMBOL
OPEN
HIGH
LOW
CLOSE
VOLUME
VOLUME CHANGE
VOLUME CHANGE %
OPEN_INT
SECTOR
TIMESTAMP
New data gets added to the table daily “Monday to Friday”, based on the stock market price changes for that day. The current requirement is based on the VOLUME field, which shows the volume traded for a particular stock on daily basis.
REQUIREMENT –
To get the Average and Total Volume for last 10,15 and 30 Days respectively.
METHOD USED CURRENTLY -
I created these 9 SEPARATE QUERIES in order to get my desired results –
First I have created these 3 queries to take out the most recent last 10,15 and 30 dates from the current table:
qryLast10DaysStored
qryLast15DaysStored
qryLast30DaysStored
Then I have created these 3 queries for getting the respective AVERAGES:
qrySymbolAvgVolume10Days
qrySymbolAvgVolume15Days
qrySymbolAvgVolume30Days
And then I have created these 3 queries for getting the respective TOTALS:
qrySymbolTotalVolume10Days
qrySymbolTotalVolume15Days
qrySymbolTotalVolume30Days
PROBLEM BEING FACED WITH CURRENT METHOD -
Now, my problem is that I have ended up having these so many different queries, whereas I wanted to get the output into One Single Query, as shown in the Snapshot of the Excel Sheet:
http://i49.tinypic.com/256tgcp.png
SOLUTION NEEDED -
Is there some way by which I can get these required fields into ONE SINGLE QUERY, so that I do not have to look into multiple places for the required fields? Can someone please tell me how to get all these separate queries into one -
A) Either by taking out or moving the results from these separate individual queries to one.
B) Or by making a new query which calculates all these fields within itself, so that these separate individual queries are no longer needed. This would be a better solution I think.
One Clarification about Dates –
Some friend might think why I used the method of using Top 10,15 and 30 for getting the last 10,15 and 30 Date Values. Why not I just used the PC Date for getting these values? Or used something like -
("VOLUME","tbl-B", "TimeStamp BETWEEN Date() - 10 AND Date()")
The answer is that I require my query to "Read" the date from the "TIMESTAMP" Field, and then perform its calculations accordingly for LAST / MOST RECENT "10 days, 15 days, 30 days” FOR WHICH THE DATA IS AVAILABLE IN THE TABLE, WITHOUT BOTHERING WHAT THE CURRENT DATE IS. It should not depend upon the current date in any way.
If there is any better method or more efficient way to create these queries, then please enlighten.
You have separate queries to compute 10DayTotalVolume and 10DayAvgVolume. I suspect you can compute both in one query, qry10DayVolumes.
SELECT
b.SYMBOL,
Sum(b.VOLUME) AS 10DayTotalVolume,
Avg(b.VOLUME) AS 10DayAvgVolume
FROM
[tbl-B] AS b INNER JOIN
qryLast10DaysStored AS q
ON b.TIMESTAMP = q.TIMESTAMP
GROUP BY b.SYMBOL;
However, that makes me wonder whether 10DayAvgVolume can ever be anything other than 10DayTotalVolume / 10
Similar considerations apply to the 15 and 30 day values.
Ultimately, I think you want something based on a starting point like this:
SELECT
q10.SYMBOL,
q10.[10DayTotalVolume],
q10.[10DayAvgVolume],
q15.[15DayTotalVolume],
q15.[15DayAvgVolume],
q30.[30DayTotalVolume],
q30.[30DayAvgVolume]
FROM
(qry10DayVolumes AS q10
INNER JOIN qry15DayVolumes AS q15
ON q10.SYMBOL = q15.SYMBOL)
INNER JOIN qry30DayVolumes AS q30
ON q10.SYMBOL = q30.SYMBOL;
That assumes you have created qry15DayVolumes and qry30DayVolumes following the approach I suggested for qry10DayVolumes.
If you want to cut down the number of queries, you could use subqueries for each of the qry??DayVolumes saved queries, but try it this way first to make sure the logic is correct.
In that second query above, there can be a problem due to field names which start with digits. Enclose those names in square brackets or re-alias them in qry10DayVolumes, qry15DayVolumes, and qry30DayVolumes using alias names which begin with letters instead of digits.
I tested the query as written above with the "2nd Upload.mdb" you uploaded, and it ran without error from Access 2007. Here is the first row of the result set from that query:
SYMBOL 10DayTotalVolume 10DayAvgVolume 15DayTotalVolume 15DayAvgVolume 30DayTotalVolume 30DayAvgVolume
ACC-1 42909 4290.9 54892 3659.46666666667 89669 2988.96666666667
Access doesn't support most advanced SQL syntax and clauses, so this is a bit of a hack, but it works, and is fast on your small sample. You're basically running 3 queries but the Union clauses allow you to combine into one:
select
Symbol,
sum([10DayTotalVol]) as 10DayTotalV,
sum([10DayAvgVol]) as 10DayAvgV,
sum([15DayTotalVol]) as 15DayTotalV,
sum([15DayAvgVol]) as 15DayAvgV,
sum([30DayTotalVol]) as 30DayTotalV,
sum([30DayAvgVol]) as 30DayAvgV
from (
select
Symbol,
sum(volume) as 10DayTotalVol, avg(volume) as 10DayAvgVol,
0 as 15DayTotalVol, 0 as 15DayAvgVol,
0 as 30DayTotalVol, 0 as 30DayAvgVol
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 10 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
sum(volume), avg(volume),
0, 0
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 15 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
0, 0,
sum(volume), avg(volume)
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 30 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
) s
group by
Symbol

Select query optimisation

I have a large table with ID, date, and some other columns. ID is indexed and sequential.
I want to select all rows after a certain date. Given that the IDs are sequential, if the rows are ordered by ID in decreasing order, once the first row that fails the date test there's no need to carry on checking. How can I make use of the index to optimise this?
You could do something like this:
With FirstFailDate AS
(
-- You start by selecting the first fail date
SELECT TOP 1 * FROM YOUR_TABLE WHERE /* DATE TEST FAILING */ ORDER BY ID DESC
)
SELECT *
FROM YOUR_TABLE t
-- Then, you join your table with the first fail date, and get all the records
-- that are before this date (by ID)
JOIN FirstFailDate f
ON f.ID > t.ID
I don't think there is a good "legal" way to do this without actually indexing date.
However, you could try something like this:
Issue the following query to the DBMS: SELECT * FROM YOUR_TABLE ORDER BY ID DESC.
Start fetching the rows in your client application.
As you fetch, check the date.
Stop fetching (and close the cursor) when the date passes the limit.
The idea is that DBMS sometimes doesn't have to finish the whole query before starting to send the partial results to the client. In this case, the hope is that the DBMS will perform an index scan on ID (due to the ORDER BY ID DESC), and you'll be able get the results as it happens and then stop it before it has even finished.
NOTE: If your DBMS gives you an option to balance between getting the first row fast, versus getting the whole result fast, pick the first option (such as /*+ FIRST_ROWS */ hint under Oracle).
Of course, perform measurements on realistic amounts of data, to make sure this actually works in your particular situation.

Performant way to get the maximum value of a running total in TSQL

We have a table of transactions which is structured like the following :
TranxID int (PK and Identity field)
ItemID int
TranxDate datetime
TranxAmt money
TranxAmt can be positive or negative, so the running total of this field (for any ItemID) will go up and down as time goes by. Getting the current total is obviously simple, but what I'm after is a performant way of getting the highest value of the running total and the TranxDate when this occurred. Note that TranxDate is not unique, and due to some backdating the ID field is not necessarily in the same sequence as TranxDate for a given Item.
Currently we're doing something like this (#tblTranx is a table variable containing just the transactions for a given Item) :
SELECT Top 1 #HighestTotal = z.TotalToDate, #DateHighest = z.TranxDate
FROM
(SELECT a.TranxDate, a.TranxID, Sum(b.TranxAmt) AS TotalToDate
FROM #tblTranx AS a
INNER JOIN #tblTranx AS b ON a.TranxDate >= b.TranxDate
GROUP BY a.TranxDate, a.TranxID) AS z
ORDER BY z.TotalToDate DESC
(The TranxID grouping removes the issue caused by duplicate date values)
This, for one Item, gives us the HighestTotal and the TranxDate when this occurred. Rather than run this on the fly for tens of thousands of entries, we only calculate this value when the app updates the relevant entry and record the value in another table for use in reporting.
The question is, can this be done in a better way so that we can work out these values on the fly (for multiple items at once) without falling into the RBAR trap (some ItemIDs have hundreds of entries). If so, could this then be adapted to get the highest values of subsets of transactions (based on a TransactionTypeID not included above). I'm currently doing this with SQL Server 2000, but SQL Server 2008 will be taking over soon here so any SQL Server tricks can be used.
SQL Server sucks in calculating running totals.
Here's a solution for your very query (which groups by dates):
WITH q AS
(
SELECT TranxDate, SUM(TranxAmt) AS TranxSum
FROM t_transaction
GROUP BY
TranxDate
),
m (TranxDate, TranxSum) AS
(
SELECT MIN(TranxDate), SUM(TranxAmt)
FROM (
SELECT TOP 1 WITH TIES *
FROM t_transaction
ORDER BY
TranxDate
) q
UNION ALL
SELECT DATEADD(day, 1, m.TranxDate),
m.TranxSum + q.TranxSum
FROM m
CROSS APPLY
(
SELECT TranxSum
FROM q
WHERE q.TranxDate = DATEADD(day, 1, m.TranxDate)
) q
WHERE m.TranxDate <= GETDATE()
)
SELECT TOP 1 *
FROM m
ORDER BY
TranxSum DESC
OPTION (MAXRECURSION 0)
You need to have an index on TranxDate for this to work fast.

Resources