trying to break down results of SQL query to show data for each month - sql-server

I'm very new to SQL and have a problem I can't figure out.
I'm trying to replace an excel spreadsheet and turn it into a PowerBi report. Currently our team runs the following query to get the amount of active users every month and types it into an excel sheet which then graphs the number of users each month showing the increase. Since I don't want to manually input data each month my goal is to break down this query to give the current number of users in each month and add to that every month.
Desired result would look something like this
dateCreated # of Users
----------------------
2008-10 295
2008-11 355
2008-12 470
2009-01 522
I was able to break it down enough to give me the amount created each month, but that doesn't give me the total amount each month. This is the query that I used and a sample of the results I got.
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') AS 'dateCreated',
COUNT(s.UserId) AS "# of Users"
FROM
ER.dbo.ssUser s,
ER.dbo.ssUserAddress USERADDR,
ER.dbo.ssAddress ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
GROUP BY
FORMAT(USERADDR.DateCreated, 'yyyy-MM')
result sample:
dateCreated # of Users
2008-10 295
2008-11 41
2008-12 22
2009-01 19
This is almost there, but I need a running total. I've tried a lot of different things including SUM, SUM OVER, COUNT OVER etc. My boss suggested a while loop. I can't get that to work either and everything I've read says that should be the last resort. Here is one example of my failed attempts
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') as 'dateCreated',
COUNT(s.UserId)
OVER(
PARTITION BY Month(USERADDR.DateCreated)
GROUP BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')
)
AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
--original query which gives total number of users right now.
SELECT
count(s.UserId) AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'

You can do a window sum() on the aggregated count of users per month, like so:
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') [dateCreated],
SUM(COUNT(s.UserId)) OVER(ORDER BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')) [# of Users]
FROM
ER.dbo.ssUser s
INNER JOIN ER.dbo.ssUserAddress USERADDR
ON s.UserId = USERADDR.UserId,
INNER JOIN ER.dbo.ssAddress ADDRESS
ON USERADDR.AddressId = ADDRESS.AddressId
WHERE Isdefault = 1 AND Type = 'soldto'
group by FORMAT(USERADDR.DateCreated, 'yyyy-MM')
Notes:
always prefer proper, explicit join syntax (with the ON keyword) over implicit, old-school joins, who were deprecated long time ago - I modified your query accordingly
SQLServer uses square brackets for identifiers - you should avoid single quotes, as they are generally used for litteral strings
you have unqualified column names in the WHERE clause: always qualify column names in your query, so it is easy to understand to which table they belong

Related

How to use the datebucket filter

Trying to use the :datebucket filter but it doesn't seem to work.
select date, address from database.table where address = 'xyz' group by :datebucket(date)
This returns the error that date isn't in the group by statement, but it is. If it add it separately to the group by statement, it just groups by the individual date instead of respecting the date bucket selection.
Not finding anything in the Snowflake documentation about how this filter is suppose to work, just that it exists.
In this site: https://www.webagesolutions.com/blog/querying-data-in-snowflake was example like this about databucket function
SELECT COUNT(ORDER_DATE) as COUNT_ORDER_DATE, ORDER_DATE
FROM ORDERS
GROUP BY :datebucket(ORDER_DATE), ORDER_DATE
ORDER BY COUNT_ORDER_DATE DESC;
So could your query work if it was modified like this:
SELECT
date,
address
FROM
database.table
WHERE
address = 'xyz'
GROUP BY :datebucket(date), date
Datebucket is truncating the date, to buckets. But you have selected the raw date.
This is like grouping by decade '60,'70,'80 of what great years, but want the actual year.
SELECT column1 as year,
truncate(year,-1) as decade
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29);
gives:
YEAR
DECADE
1
0
2
0
3
0
14
10
15
10
16
10
27
20
28
20
29
20
so if I try select
SELECT column1 as year
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29)
GROUP BY truncate(year,-1)
ORDER BY 1;
gives the error
Error: 'VALUES.COLUMN1' in select clause is neither an aggregate nor in the group by clause. (line 15)
So if we move the decade into the selection, it makes sense:
SELECT truncate(column1,-1) as decade
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29)
GROUP BY decade
ORDER BY 1;
and we get the
DECADE
0
10
20
So the problem is not :datebucket(date) but the fact while :datebucket(date) and date are related, from the perspective of GROUPING they are unrelated.
I've been trying to use datebucket(date) and daterange, and I also needed the results in a Snowflake graph.
It was a bit trick, because the value returned by datebucket(date) is actually a truncated date based on the selected date part. For that, I had to convert it to a char, and it worked!
select
to_char(:datebucket(start_time), 'YYYY.MM.DD # HH24') as start_time_bucket,
sum(credits_used) as credits_used
from snowflake.account_usage.warehouse_metering_history wmh
where
start_time = :daterange
group by :datebucket(start_time)
And if you're an ACCOUNTADMIN, you can now use the query to get the total credits usage by date :)
Last, to answer the main query by Tony, the query should be:
select date, address
from database.table
where address = 'xyz'
group by :datebucket(date), date, address
// or
select :datebucket(date), address
from database.table
where address = 'xyz'
group by :datebucket(date), address
Try adding the :datebucket(date) in the select part as well (not only in group by). Also, you will probably need an aggregate function for the field address (for example any_value(address):
select :datebucket(date), any_value(address)
from database.table
where address = 'xyz'
group by :datebucket(date)

SQL Selecting all but newest result per id

I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data.
What am I doing wrong here?
Here's my select statement:
select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id
Here's the actual query I want to run:
update EV.Test_Result
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.
EV.Test_Result.test_row_id = t.test_row_id
and EV.Test_Result.start_time < t.start_time
This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.
and t.device_id = 1219
and t.waived = 0
This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.
Something like the following might be what you needed:
SELECT A.test_row_id
, A.test_result_id
, A.waived
, A.pass
, A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
, test_row_id
FROM EV.Test_Result
WHERE device_id = 1219
AND waived = 0
GROUP BY test_row_id
) AS T ON A.test_row_id = T.test_row_id
AND A.start_time < T.start_time
ORDER BY A.test_row_id
This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.
UPDATE:
Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:
As you know, you write a simple SELECT statement like the following:
SELECT <aggregate column>, SUM(<non-aggregate column>) AS Cost
FROM <table_name>
WHERE <column> = 'some_value'
GROUP BY <aggregate column>
HAVING SUM(<non-aggregate column>) > some_value
ORDER BY <column>
Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.
Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:
FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY
There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/
Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).
Hope this helps solve the problem and better yet how SQL works. Cheers
Can you try ROW_NUMBER () function order by timestamp descending and filtering out values having ROW_NUMBER 1 ;
Below query should fetch all records per id except the latest one
I tried below query in Oracle with a table having fields : id,user_id, record_order adn timestamp and it worked :
select
<table_name_alias>.*
from
(
select
id,
user_id,
row_number() over (partition by id order by record_order desc) as record_number
from
<your_table_name>
) <table_name_alias>
where
record_number <>1;
If you are using Teradata DB, you can also try QUALIFY statement. I'm not sure if all DBs support this.
Select
table_name.*
from table_name
QUALIFY row_number() over (partition by id order by record_order desc) <>1;

SUM of values generates incorrect totals

I'm writing a report to total a specific set of transactions grouped by day however, the total is incorrect. In the code listed below, no errors are generated but the daily total is off by 11.
SELECT
'1,*'+char(13)+char(10)
+'80,1006062'+char(13)+char(10)
+'100,10'+char(13)+char(10)
+'2405,'+cast(sum(trans.QUANTITY) as varchar(18))+char(13)+char(10) --Census events --as varchar(10)
+'9999,'+cast(count(distinct trans.TX_ID) as varchar(18))+char(13)+char(10) --Count of Records (for analysis/ validation only)
+'2420,'+format(trans.SERVICE_DATE,'M/d/yyyy') --as service date of Observation Procedure
FROM PAT_ENC_HSP hsp
inner join HSP_TRANSACTIONS trans on hsp.HSP_ACCOUNT_ID=trans.HSP_ACCOUNT_ID
WHERE TRANS.TX_TYPE_HA_C = '1' AND-- Billed procedures
datediff(day,trans.SERVICE_DATE,cast(CURRENT_TIMESTAMP as date)) between 7 and 7 AND
PROC_ID in ('90068','94788','94790','94792','94794','10240')
group by format(trans.SERVICE_DATE,'M/d/yyyy')
order by format(trans.SERVICE_DATE,'M/d/yyyy')
This generates the results...
1,*80,1006062 100,10 2405,305.000 9999,90 2420,4/25/2016
I double checked my totals by exporting the results of this query into Excel, which also generated 90 records for 25 Apr. However, the total quantity was "294" and not "305"
SELECT DISTINCT TRANS.TX_TYPE_HA_C, TRANS.PROCEDURE_DESC, TRANS.PROC_ID, TRANS.DEPARTMENT, TRANS.QUANTITY, TRANS.SERVICE_DATE, TRANS.TX_ID, TRANS.TX_POST_DATE
FROM PAT_ENC_HSP HSP LEFT OUTER JOIN HSP_TRANSACTIONS TRANS on
HSP.HSP_ACCOUNT_ID=TRANS.HSP_ACCOUNT_ID
WHERE PROC_ID in ('90068','94788','94790','94792','94794','10240')
AND TRANS.TX_TYPE_HA_C = '1'
AND TRANS.SERVICE_DATE ={ts '2016-04-25 00:00:00'}
ORDER BY TRANS.TX_ID
Results are attached
Not knowing which of these was correct, I recreated the same query in Crystal Reports and once again received the 294 value. Unfortunately, I have to use the format from the initial query to upload the formatted results into another application. I'm not sure why the total values are not the same across all three methods and am assuming that I'm doing something wrong in the "cast(sum" statement.

Need help comparing difference between datetime stamps in SQL DB

I have a MSSQL server 2012 express DB that logs user activities. I need some help creating a query to compare timestamps on the user activities based on the text in the notes. I am interested in figuring out how long it takes my users to perform certain activities. The activities they are performing are stored in text in a NOTES column. I want to build a query that will tell me the time difference for each [INVOICEID] from the ‘START NOTE’ to the next note for that invoice by that user. The note that is entered is always the same for the start of the timer (for the purposes of this I used ‘START NOTE’ to indicate the start of the timer, but I have multiple activites I will need to do this for so I plan on simply changing that text in the query), but the end of the timer the text of the note will vary because it will be user entered data. I want to find the time difference between ‘START NOTE’ and the note that immediately follows ‘START NOTE’ entered by the same USERID for the same INVOICEID. Please see the SQLfiddle for an example of my data:
http://sqlfiddle.com/#!3/a00d7/1
With the data in the sql fiddle I would want the results of this query to be:
INVOICE ID USERID TIME_Difference
100 5 1 day
101 5 3 days
102 5 9 days
(time_difference does not need to be formatted like that, standard SQL formatting is fine)
I don’t really know where to start with this. Please let me know if you can help.
Thanks
select a.userid,a.invoiceid,min(a.added),min(b.added),datediff(DAY,min(a.added),min(b.added)) from om_note a
left join om_note b on a.userid=b.userid and a.invoiceid = b.invoiceid and a.added < b.added
where a.notes = 'START NOTE' group by a.userid,a.invoiceid
;with x as (
select
o.*, sum(case when notes='START NOTE' then 1 else 0 end)
over(partition by o.invoiceid, o.userid order by o.added) as grp
from om_note o
),
y as (
select *,
row_number() over(partition by x.invoiceid, x.userid, x.grp order by x.added) as rn
from x
where grp > 0
)
select y1.invoiceid, y1.userid, datediff(hour, y1.added, y2.added)
from y y1
inner join y y2
on y1.invoiceid=y2.invoiceid and y1.userid=y2.userid and y1.grp=y2.grp
where y1.rn=1 and y2.rn=2

MS Access : Average and Total Calculation in Single Query

INTRODUCTION TO DATABASE TABLE BEING USED -
I am working on a “Stock Market Prices” based Database Table. My table has got the data for the following FIELDS –
ID
SYMBOL
OPEN
HIGH
LOW
CLOSE
VOLUME
VOLUME CHANGE
VOLUME CHANGE %
OPEN_INT
SECTOR
TIMESTAMP
New data gets added to the table daily “Monday to Friday”, based on the stock market price changes for that day. The current requirement is based on the VOLUME field, which shows the volume traded for a particular stock on daily basis.
REQUIREMENT –
To get the Average and Total Volume for last 10,15 and 30 Days respectively.
METHOD USED CURRENTLY -
I created these 9 SEPARATE QUERIES in order to get my desired results –
First I have created these 3 queries to take out the most recent last 10,15 and 30 dates from the current table:
qryLast10DaysStored
qryLast15DaysStored
qryLast30DaysStored
Then I have created these 3 queries for getting the respective AVERAGES:
qrySymbolAvgVolume10Days
qrySymbolAvgVolume15Days
qrySymbolAvgVolume30Days
And then I have created these 3 queries for getting the respective TOTALS:
qrySymbolTotalVolume10Days
qrySymbolTotalVolume15Days
qrySymbolTotalVolume30Days
PROBLEM BEING FACED WITH CURRENT METHOD -
Now, my problem is that I have ended up having these so many different queries, whereas I wanted to get the output into One Single Query, as shown in the Snapshot of the Excel Sheet:
http://i49.tinypic.com/256tgcp.png
SOLUTION NEEDED -
Is there some way by which I can get these required fields into ONE SINGLE QUERY, so that I do not have to look into multiple places for the required fields? Can someone please tell me how to get all these separate queries into one -
A) Either by taking out or moving the results from these separate individual queries to one.
B) Or by making a new query which calculates all these fields within itself, so that these separate individual queries are no longer needed. This would be a better solution I think.
One Clarification about Dates –
Some friend might think why I used the method of using Top 10,15 and 30 for getting the last 10,15 and 30 Date Values. Why not I just used the PC Date for getting these values? Or used something like -
("VOLUME","tbl-B", "TimeStamp BETWEEN Date() - 10 AND Date()")
The answer is that I require my query to "Read" the date from the "TIMESTAMP" Field, and then perform its calculations accordingly for LAST / MOST RECENT "10 days, 15 days, 30 days” FOR WHICH THE DATA IS AVAILABLE IN THE TABLE, WITHOUT BOTHERING WHAT THE CURRENT DATE IS. It should not depend upon the current date in any way.
If there is any better method or more efficient way to create these queries, then please enlighten.
You have separate queries to compute 10DayTotalVolume and 10DayAvgVolume. I suspect you can compute both in one query, qry10DayVolumes.
SELECT
b.SYMBOL,
Sum(b.VOLUME) AS 10DayTotalVolume,
Avg(b.VOLUME) AS 10DayAvgVolume
FROM
[tbl-B] AS b INNER JOIN
qryLast10DaysStored AS q
ON b.TIMESTAMP = q.TIMESTAMP
GROUP BY b.SYMBOL;
However, that makes me wonder whether 10DayAvgVolume can ever be anything other than 10DayTotalVolume / 10
Similar considerations apply to the 15 and 30 day values.
Ultimately, I think you want something based on a starting point like this:
SELECT
q10.SYMBOL,
q10.[10DayTotalVolume],
q10.[10DayAvgVolume],
q15.[15DayTotalVolume],
q15.[15DayAvgVolume],
q30.[30DayTotalVolume],
q30.[30DayAvgVolume]
FROM
(qry10DayVolumes AS q10
INNER JOIN qry15DayVolumes AS q15
ON q10.SYMBOL = q15.SYMBOL)
INNER JOIN qry30DayVolumes AS q30
ON q10.SYMBOL = q30.SYMBOL;
That assumes you have created qry15DayVolumes and qry30DayVolumes following the approach I suggested for qry10DayVolumes.
If you want to cut down the number of queries, you could use subqueries for each of the qry??DayVolumes saved queries, but try it this way first to make sure the logic is correct.
In that second query above, there can be a problem due to field names which start with digits. Enclose those names in square brackets or re-alias them in qry10DayVolumes, qry15DayVolumes, and qry30DayVolumes using alias names which begin with letters instead of digits.
I tested the query as written above with the "2nd Upload.mdb" you uploaded, and it ran without error from Access 2007. Here is the first row of the result set from that query:
SYMBOL 10DayTotalVolume 10DayAvgVolume 15DayTotalVolume 15DayAvgVolume 30DayTotalVolume 30DayAvgVolume
ACC-1 42909 4290.9 54892 3659.46666666667 89669 2988.96666666667
Access doesn't support most advanced SQL syntax and clauses, so this is a bit of a hack, but it works, and is fast on your small sample. You're basically running 3 queries but the Union clauses allow you to combine into one:
select
Symbol,
sum([10DayTotalVol]) as 10DayTotalV,
sum([10DayAvgVol]) as 10DayAvgV,
sum([15DayTotalVol]) as 15DayTotalV,
sum([15DayAvgVol]) as 15DayAvgV,
sum([30DayTotalVol]) as 30DayTotalV,
sum([30DayAvgVol]) as 30DayAvgV
from (
select
Symbol,
sum(volume) as 10DayTotalVol, avg(volume) as 10DayAvgVol,
0 as 15DayTotalVol, 0 as 15DayAvgVol,
0 as 30DayTotalVol, 0 as 30DayAvgVol
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 10 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
sum(volume), avg(volume),
0, 0
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 15 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
0, 0,
sum(volume), avg(volume)
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 30 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
) s
group by
Symbol

Resources