MS Access : Average and Total Calculation in Single Query - database

INTRODUCTION TO DATABASE TABLE BEING USED -
I am working on a “Stock Market Prices” based Database Table. My table has got the data for the following FIELDS –
ID
SYMBOL
OPEN
HIGH
LOW
CLOSE
VOLUME
VOLUME CHANGE
VOLUME CHANGE %
OPEN_INT
SECTOR
TIMESTAMP
New data gets added to the table daily “Monday to Friday”, based on the stock market price changes for that day. The current requirement is based on the VOLUME field, which shows the volume traded for a particular stock on daily basis.
REQUIREMENT –
To get the Average and Total Volume for last 10,15 and 30 Days respectively.
METHOD USED CURRENTLY -
I created these 9 SEPARATE QUERIES in order to get my desired results –
First I have created these 3 queries to take out the most recent last 10,15 and 30 dates from the current table:
qryLast10DaysStored
qryLast15DaysStored
qryLast30DaysStored
Then I have created these 3 queries for getting the respective AVERAGES:
qrySymbolAvgVolume10Days
qrySymbolAvgVolume15Days
qrySymbolAvgVolume30Days
And then I have created these 3 queries for getting the respective TOTALS:
qrySymbolTotalVolume10Days
qrySymbolTotalVolume15Days
qrySymbolTotalVolume30Days
PROBLEM BEING FACED WITH CURRENT METHOD -
Now, my problem is that I have ended up having these so many different queries, whereas I wanted to get the output into One Single Query, as shown in the Snapshot of the Excel Sheet:
http://i49.tinypic.com/256tgcp.png
SOLUTION NEEDED -
Is there some way by which I can get these required fields into ONE SINGLE QUERY, so that I do not have to look into multiple places for the required fields? Can someone please tell me how to get all these separate queries into one -
A) Either by taking out or moving the results from these separate individual queries to one.
B) Or by making a new query which calculates all these fields within itself, so that these separate individual queries are no longer needed. This would be a better solution I think.
One Clarification about Dates –
Some friend might think why I used the method of using Top 10,15 and 30 for getting the last 10,15 and 30 Date Values. Why not I just used the PC Date for getting these values? Or used something like -
("VOLUME","tbl-B", "TimeStamp BETWEEN Date() - 10 AND Date()")
The answer is that I require my query to "Read" the date from the "TIMESTAMP" Field, and then perform its calculations accordingly for LAST / MOST RECENT "10 days, 15 days, 30 days” FOR WHICH THE DATA IS AVAILABLE IN THE TABLE, WITHOUT BOTHERING WHAT THE CURRENT DATE IS. It should not depend upon the current date in any way.
If there is any better method or more efficient way to create these queries, then please enlighten.

You have separate queries to compute 10DayTotalVolume and 10DayAvgVolume. I suspect you can compute both in one query, qry10DayVolumes.
SELECT
b.SYMBOL,
Sum(b.VOLUME) AS 10DayTotalVolume,
Avg(b.VOLUME) AS 10DayAvgVolume
FROM
[tbl-B] AS b INNER JOIN
qryLast10DaysStored AS q
ON b.TIMESTAMP = q.TIMESTAMP
GROUP BY b.SYMBOL;
However, that makes me wonder whether 10DayAvgVolume can ever be anything other than 10DayTotalVolume / 10
Similar considerations apply to the 15 and 30 day values.
Ultimately, I think you want something based on a starting point like this:
SELECT
q10.SYMBOL,
q10.[10DayTotalVolume],
q10.[10DayAvgVolume],
q15.[15DayTotalVolume],
q15.[15DayAvgVolume],
q30.[30DayTotalVolume],
q30.[30DayAvgVolume]
FROM
(qry10DayVolumes AS q10
INNER JOIN qry15DayVolumes AS q15
ON q10.SYMBOL = q15.SYMBOL)
INNER JOIN qry30DayVolumes AS q30
ON q10.SYMBOL = q30.SYMBOL;
That assumes you have created qry15DayVolumes and qry30DayVolumes following the approach I suggested for qry10DayVolumes.
If you want to cut down the number of queries, you could use subqueries for each of the qry??DayVolumes saved queries, but try it this way first to make sure the logic is correct.
In that second query above, there can be a problem due to field names which start with digits. Enclose those names in square brackets or re-alias them in qry10DayVolumes, qry15DayVolumes, and qry30DayVolumes using alias names which begin with letters instead of digits.
I tested the query as written above with the "2nd Upload.mdb" you uploaded, and it ran without error from Access 2007. Here is the first row of the result set from that query:
SYMBOL 10DayTotalVolume 10DayAvgVolume 15DayTotalVolume 15DayAvgVolume 30DayTotalVolume 30DayAvgVolume
ACC-1 42909 4290.9 54892 3659.46666666667 89669 2988.96666666667

Access doesn't support most advanced SQL syntax and clauses, so this is a bit of a hack, but it works, and is fast on your small sample. You're basically running 3 queries but the Union clauses allow you to combine into one:
select
Symbol,
sum([10DayTotalVol]) as 10DayTotalV,
sum([10DayAvgVol]) as 10DayAvgV,
sum([15DayTotalVol]) as 15DayTotalV,
sum([15DayAvgVol]) as 15DayAvgV,
sum([30DayTotalVol]) as 30DayTotalV,
sum([30DayAvgVol]) as 30DayAvgV
from (
select
Symbol,
sum(volume) as 10DayTotalVol, avg(volume) as 10DayAvgVol,
0 as 15DayTotalVol, 0 as 15DayAvgVol,
0 as 30DayTotalVol, 0 as 30DayAvgVol
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 10 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
sum(volume), avg(volume),
0, 0
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 15 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
0, 0,
sum(volume), avg(volume)
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 30 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
) s
group by
Symbol

Related

Workaround on Sliding Window Function in Snowflake

I've stumbled upon a problem that is giving me huge headaches, which is the following:
I have a table Deals, that contains information about this entity from our Sales CRM. I also have a table Company, that contains information about the companies pegged to those deals.
I was asked to compute a metric called Pipeline Conversion Rate, which is calculated as:
Won deals / Created Deals
Until here, everything is quite clear. Nevertheless, when computing this metric I was asked to do so in a sliding-window-function-fashion, which means to compute the metric only looking at the prior 90 days. Thing is that to look at the last 90 days of the numerator, we need to use one Date (created date); while when looking at the prior 90 days of the denominator, we should take into account the closed date (both dimensions are part of the Deals table).
There wouldn't be any problem if we could do this kind of window functions in Snowflake, as the following (I know syntax may not be exactly this one, but you get the idea):
count(deal_id) over (
partition by is_inbound, sales_agent, sales_tier, country
order by created_date range between 90 days preceding and current row
) as created_deals_last_90_days,
count(case when is_deal_won then deal_id end) over (
partition by is_inbound, sales_agent, sales_tier, country
order by created_date range between 90 days preceding and current row
) as won_deals_last_90_days
But we can't as far as I know. So my current workaround is the following (taken from this post):
select
calendar_date,
is_inbound,
sales_tier,
sales_agent,
country,
(
select count(deal_id)
from deals
where d.is_inbound = is_inbound
and d.sales_tier = sales_tier
and d.sales_agent = sales_agent
and d.country = country
and created_date between cal.calendar_date - 90 and cal.calendar_date
) as created_deals_last_90_days,
(
select count(case when is_deal_won then deal_id end)
from deals
where d.is_inbound = is_inbound
and d.sales_tier = sales_tier
and d.sales_agent = sales_agent
and d.country = country
and closed_date between cal.calendar_date - 90 and cal.calendar_date
) as won_deals_last_90_days
from calendar as cal
left join deals as d on cal.calendar_date between d.created_date and d.closed_date
*Note that I am using a calendar table here as base table, in order to have visibility on all calendar dates since without it I might say I'd be missing on those dates where there are no new deals (could happen on weekends).
Problem is that I am not getting correct figures when I cross check the raw data and the output of this query, and I have no idea how to make this (ugly) workaround, well... work.
Any ideas are more than welcome!
Well, it turns out it was way easier than I expected. After some trial-and-error, I figured out the only thing that could be failing was the JOIN condition in the outer query:
on cal.calendar_date between d.created_date and d.closed_date
This was assuming that both dates needed to be in the range, while this assumption is wrong. By tweaking the above mentioned part of the outer query to:
on cal.calendar_date >= d.created_date
It captures all those Deals that were created on or before the calendar_date, and therefore all of them since it is a mandatory field.
Maintaining the rest of the query as is, and assuming that there will be no nulls in any of the partitions, the results are the ones I expected.

Date difference between events in same column

I am new to MS SQL and I am having difficulty in getting correct results. I currently have a table with headers updateddate(date), addedby(number), operationid(number) and servicereqno(number) that have been selected from three different tables.
The operationid is an event and the servicereqno is a job number. I want to be able to get a time difference taken between events and then once this is established calculate the average.
For example:
opid 512 is at 20:15 - servicereqno 1
opid 535 is at 21:23 - servicereqno 1
My first task will be to determine what the difference in time is, ensuring it is within the same job number.
Many thanks
In order to get the timespan between events, you need to number all operations sequentially (one increments without leftout), and then join that on itself with an offset of 1. You'll get n-1 rows as result with the timespan inbetween.
Something like this:
WITH cteOps AS (
SELECT ROW_NUMBER() OVER (PARTITION BY servicereqno ORDER BY updateddate) seqid, updateddate, servicereqno
FROM yourdatasource
)
SELECT DATEDIFF(millisecond, o1.updateddate, o2.updateddate) updateddatediff, servicereqno
FROM cteOps o1
JOIN cteOps o2 ON o1.seqid=o2.seqid+1 AND o1.servicereqno=o2.servicereqno;
Of course you can perform aggregates on that to get the average or whatever you need.
SQL Server provides a built-in DATEDIFF function, which you can use to find the difference between two dates. Furthermore, SQL Server has a built-in AVG function that you can use to calculate the average of a set of values..

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

Performance issue with DATEADD function

This is my actual select query,
SELECT b.CaseNumber as CaseNumber,b.DebtorNr ,b.ActionDate,DATEADD(MONTH,-12,b.ActionDate) one_month,a.Registerdate --COALESCE(count(A.historynr),0) as DebtorActivity
from rr..r_basic_info b
join rr..activities_VW as A on b.DebtorNr=a.Debtornr
where
B.Debtornr = A.Debtornr
--and a.Registerdate<=b.ActionDate --this condition works
and a.registerdate >= DATEADD(month,-12,getdate()) --i have a problem with this condition and causing huge time consumption
I have a view defined here is activities_VW
select H.NR as historynr,o.debtornr as Debtornr, O.NR as ordernr, h.Actmenunr as Actmenunr,h.AGREEMENT as AgreementCode, h.Registerdate as Registerdate
from abc..history h join abc..orders o on o.NR=h.ORDERNR
and my execution plan is like
One more information for all rows b.actionDate column has identical value like '2015-04-11 08:37:44.037'.
I have checked with all date format but nothing wrong found.
For another case, I have different value for different rows in b.actionDate column and it is working fine for that case.
Thanks
I may be wrong in my understanding so take it only as a possibility - when using a function within a join criteria and/or where clause, in order to determine if data meets the criteria, it must be checked against every row in the table.
Think about your first part WHERE e.DATE <= a.joining_date - you can simply look directly at rows that are less than the e.DATE.
For your second part AND e.DATE >= DATEADD(MONTH, - 6, a.joining_date) - there is no column the is "joining date minus 6 months", so to determine if e.Date is greater than it, you would need to perform that calculation on every instance of a.joining_date in the table.
Remember that where clause information is not necessarily evaluated in the order is it written down in the query - so the rows you would think are eliminated by the first part of your where are not necessarily eliminated by it. So as one of the comments suggested, using a computed/persisted column on the DATEADD(MONTH, - 6, a.joining_date) would probably work well.
This is my actual select query,
SELECT b.CaseNumber as CaseNumber,b.DebtorNr ,b.ActionDate,DATEADD(MONTH,-12,b.ActionDate) one_month,a.Registerdate --COALESCE(count(A.historynr),0) as DebtorActivity
from rr..r_basic_info b
join rr..activities_VW as A on b.DebtorNr=a.Debtornr
where
B.Debtornr = A.Debtornr
--and a.Registerdate<=b.ActionDate --this condition works
and a.registerdate >= DATEADD(month,-12,getdate()) --i have a problem with this condition and causing huge time consumption
I have a view defined here is activities_VW
select H.NR as historynr,o.debtornr as Debtornr, O.NR as ordernr, h.Actmenunr as Actmenunr,h.AGREEMENT as AgreementCode, h.Registerdate as Registerdate
from abc..history h join abc..orders o on o.NR=h.ORDERNR
and my execution plan is like
One more information for all rows b.actionDate column has identical value like '2015-04-11 08:37:44.037'.
I have checked with all date format but nothing wrong found.
For another case, I have different value for different rows in b.actionDate column and it is working fine for that case.
Thanks

MSSQL Comparing rows same table

Hi im looking to compare several rows and check if a certain condition is true/false.
The tables has several columns the ones im interested in are:
Events.Badgeno
Events.Name
Events.Date
Events.Time
Events.Region_id
Events.Data
The region ID can either be 1 or 2.
I want to check weather the same badgeno registers with a different region within a specified date/time difference say 10 mins. (Could be 10 mins before or 10 mins after).
I'm looking to show the records which don't have a record against the 2 regions.
As a further note it should only be within the first and last records of that badge per day.
Normally each record should have a region 1 and 2 record at the start and end. But there maybe multiple region 1's through out the day.
Any suggestions for the best method?
Id Date Time Name Badgeid Region
3385033 27/02/2014 08:16:11 FirstName Surname 5304 2
I think something like this would work
SELECT e.Badgeno,e.Name, e.Date, e.Time,e.Region_id, e.Data
FROM events e
INNER JOIN events e1 ON e1.BadgeNo = e.BadgeNo AND e1.Region_id <> e.RegionId AND DATEDIFF(minutes,e1.date + e1.time,e.date + e.time) > -10 AND DATEDIFF(minutes,e1.date + e1.time,e.date + e.time) < 10
WHERE e1.Region_id IS NULL
you should provide sample data.
This Query is not complete,you can try something with
row_number/rank/dense, partition and check thus number column
generated .
select *,
row_number()over(partition by badgeno,regionno order by badge no)rn from table
where condition of date time

Resources