Snowflake Cost Governance Queries - snowflake-cloud-data-platform

Need some pointers if we have any references/links on getting all the queries which would identify the various cost metrics. My requirement is to build Snowsight dashboards using those queries.
Storage Cost
Compute Cost
Managed Service Cost(snowpipe/materialized view/etc..)
Sample query is given as below, need similar set of queries for all metrics as mentioned in description.
-- For Compute (this gives credits used by warehouse/hr)
select to_char(start_time,'HH24') as hour,
WAREHOUSE_NAME,
sum(credits_used)
from snowflake.account_usage.warehouse_metering_history wmh
where wmh.start_time >= dateadd(month, -1, current_date())
group by to_char(start_time,'HH24'), WAREHOUSE_NAME
order by 1;

https://medium.com/snowflake/monitoring-your-snowflake-organization-with-snowsight-b1acd470dc17
Below are some sample queries and you can improvise these metrics to cater the needs:
--COST BY MONTH PER WAREHOUSE--
SELECT
WMH.WAREHOUSE_NAME,
MONTHNAME(WMH.START_TIME) MONTH,
SUM((4 * WMH.CREDITS_USED)) AS DOLLARS_USED
FROM
SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WMH
GROUP BY
WMH.WAREHOUSE_NAME,
MONTHNAME(WMH.START_TIME)
--MONTHLY COST BY WAREHOUSE (CUSTOM/SNOWFLAKE)--
SELECT
WMH.WAREHOUSE_NAME,
MONTHNAME(WMH.START_TIME) MONTH,
SUM(WMH.CREDITS_USED) AS TOTAL_CREDITS,
SUM(WMH.CREDITS_USED_COMPUTE) AS CUSTOMER_COMPUTE,
SUM(CREDITS_USED_CLOUD_SERVICES) AS SNOWFLAKE_COMPUTE
FROM
SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WMH
GROUP BY
WMH.WAREHOUSE_NAME,
MONTHNAME(WMH.START_TIME);

Related

SSRS: Why am I getting this aggregation?

I've recently uncovered that SSRS is doing a bizarre aggregation and I really don't understand why. In this report I'm building, as with other SQL queries I've built, I have a tendency to take preliminary results from an initial query, throw them into a temp table, and then perform another query and joining on that temp table to get my 'final' results I need to display. Here's an example:
--1. This query fetches all available rows based on the day (must be last day of month)
SELECT DISTINCT Salesperson ,c.Cust_Alias ,cost ,eomonth(CreateDate) createdate ,FaxNumber
INTO #equip
FROM PDICompany_2049_01.dbo.Customers c
JOIN PDICompany_2049_01.dbo.Customer_Locations cl ON c.Cust_Key = cl.CustLoc_Cust_Key
JOIN ricocustom..Equipment_OLD e ON e.FaxNumber = c.Cust_ID + '/' + cl.CustLoc_ID
JOIN PDICompany_2049_01.dbo.Charges ch ON ch.Chg_CustLoc_Key = cl.CustLoc_Key
WHERE Salesperson = #Salesperson
AND ch.Chg_Balance = 0
--2. This query fetches first result set, but filters further for matching date variable
SELECT DISTINCT (cost) EquipCost ,Salesperson ,DATEPART(YEAR, CreateDate) YEAR
,DATEPART(MONTH, CreateDate) MONTH ,Cust_Alias ,FaxNumber
INTO #equipcost
FROM #equip
WHERE Salesperson = #Salesperson
AND DATEPART(MONTH, CreateDate) = DATEPART(MONTH, #Start)
AND DATEPART(year, CreateDate) = DATEPART(year, #Start)
ORDER BY Cust_Alias
--3. Finally, getting sum of the EquipCost, with other KPI's, to put into my final result set
SELECT sum(EquipCost) EquipCost ,Salesperson ,YEAR ,MONTH ,Cust_Alias
INTO #temp_equipcost
FROM #equipcost
GROUP BY Salesperson ,year ,month ,Cust_Alias
Now I am aware that I could have easily reduced this to 2 queries instead of 3 in hindsight (and I have since gotten my results into a single query). But that's where I'm looking for the answer. In my GUI report, I had a row that was showing to have 180 for equipcost, but my query was showing 60. It wasn't until I altered my query to a single iteration (as opposed to the 3), and while I'm still getting the same result of 60, it now displays 60 in my GUI report.
I actually had this happen in another query as well, where I had 2 temp table result sets, but when I condensed it into one, my GUI report worked as expected.
Any ideas on why using multiple temp tables would affect my results via the GUI report in SQL Report Builder (NOT USING VB HERE!) but my SQL query within SSMS works as expected? And to be clear, only making the change described to the query and condensing it got my results, the GUI report in Report Builder is extremely basic, so nothing crazy regarding grouping, expressions, etc.
My best guess is that you accidentally had a situation where you did not properly clear the temp tables (or you populated the temp tables multiple times). As an alternative to temp tables, you could instead use table variables. Equally you could use a single query from the production tables -- using CTE if you want it to "feel" like 3 separate queries.

How to get multiple average values using subqueries

There are many accountants and each of them has jobs (paid by the hour) and I need to get the accountant name of every accountant who has an average job cost higher than the overall average of job costs. How do I do this?
SELECT Accountant_Name, AVG(job_cost) as 'Average'
FROM job_view
WHERE Average > (SELECT AVG (job_cost) AS AV
FROM job_view)
GROUP BY Accountant_Name;
Everything needed is in a view named the job_view. The above code is not working any help on modifications would be appreciated. Thanks in advance.
This should do it for you:
SELECT Accountant_Name
, AVG(Job_Cost) as 'Average'
FROM Job_View
GROUP BY Accountant_Name
HAVING AVG(Job_Cost) > (SELECT AVG(Job_Cost) FROM Job_View)
As per your comment, the error you're getting at WHERE Average > is because the alias Average is not visible in a WHERE clause and usually requires you to put the entire contents of the column just as you defined it in the SELECT part.
But because in the SELECT part the Average column is a aggregate function, these can only go lower in the HAVING section, because HAVING handles filtering of aggregate conditions.
Why all this? Because there are rules for order of execution of statements in a query, as explained here.
You'll still need to Group by Accountant_Name
SELECT Accountant_Name, AVG(job_cost) as 'Average'
FROM job_view
GROUP BY Accountant_Name
Having AVG(job_cost) > (SELECT AVG (job_cost) FROM job_view);

Looking for sample anomaly detection in bank transactions records (in TSQL or R)

I want to detect unexpected transactions in bank transaction record and in GL entries.
Like for a supplier the amount are generally 2000$ or 5000$ per month, but suddenly there is a transaction with 10 000$ or 200$ which is unexpected.
I'll have multiple columns participating in this anomaly detection: customer, supplier, account number, transaction description, user entering the transaction (for the GL side) etc...
My data are in SQL Server, so I'm looking for sample code working in TSQL. Else I can rely on R scripts.
thanks for your help.
Might not be exactly what you're looking for, but with Azure Machine Learning Studio you can use One-Class classifier to create an experiment that detects anomalies depending on the threshold. And you can easy feed it the data from your SQL Server.
It might be an overkill, but you can use the free version, which is more than you need.
An option would to use a SQL Windowing function to get the standard deviation of the transaction amount and look for entries that are more than 1 standard deviation away.
CREATE TABLE #tmp(
Customer nvarchar(255),
Supplier nvarchar(255),
AccountNumber nvarchar(255),
STD decimal(20, 2),
Amount decimal(20, 2))
INSERT INTO #tmp (Customer, Supplier, AccountNumber, Amount, STD)
SELECT Customer, Supplier, AccountNumber, Amount,
ABS(STDEV(Amount) OVER (PARTITION BY Customer, Supplier, AccountNumber))
FROM SourceTable
SELECT *
FROM #tmp
WHERE STD >= 1

Why SQL query with short date range runs slow, while a long range runs quickly?

Platform: SQL Server 2008 R2
I have a reasonably complex SQL query that queries (2) distinct databases and collects a report based on the date range of the records. Searching through over 3,000,000 rows for a date range of, say, 2 months is nearly instantaneous. However, searching a short date range of, say, 7 days takes nearly two minutes. For the life of me, I cannot understand why this would be. Here's the query below:
;with cte_assets as (
select a.f_locationid, a.f_locationparent, 0 as [lev], convert(varchar(30), '0_' + convert(varchar(10), f_locationid)) lineage
from [db_assets].[dbo].[tb_locations] a where f_locationID = '366' UNION ALL
select a.f_locationid
,a.f_locationparent
,c.[lev] + 1
,convert(varchar(30), lineage + '_' + convert(varchar(10), a.f_locationid))
from cte_assets c
join [db_assets].[dbo].[tb_locations] a
on a.f_locationparent = c.f_locationID
),
cte_a AS
(
select f_assetnetbiosname as 'Computer Name'
from cte_assets c
JOIN [db_assets].[dbo].[tb_assets] ass on ass.f_assetlocation = c.f_locationID
)
select apps.f_applicationname, apps.f_applicationID, sum(f_runtime/60) [RunTime]
from cte_a c
JOIN [db_reports].[dbo].[tb_applicationusage] ss on ss.f_computername = c.[Computer Name]
JOIN [db_reports].[dbo].[tb_applications] apps
ON ss.f_application = apps.f_applicationID
WHERE ss.f_runtime IS NOT NULL AND f_starttime BETWEEN '1/26/2015 10:55:03 AM' AND '2/12/2015 10:55:03 AM'
group by apps.f_applicationname, ss.f_application, apps.f_applicationID
ORDER BY RunTime DESC
The final WHERE clause (3rd-to-last line) is where the date range is specified. The date range shown in the query of BETWEEN '1/26/2015 10:55:03 AM' AND '2/12/2015 10:55:03 AM' works quickly without issues. If we change the query to, for example, BETWEEN '1/27/2015 10:55:03 AM' AND '2/12/2015 10:55:03 AM' (just a day later) it takes over two minutes to run. I have absolutely no idea why a short range would cause the query to run more slowly. Any assistance is appreciated.
Thanks,
Beems
I apologize, I should have Googled this problem more thoroughly first. I found the answer on another StackOverflow post regarding the need to update statistics:
SQL query takes longer time when date range is smaller?
Using the MSDN article not linked in that article, but this one here:
Updates query optimization statistics on a table or indexed view. By
default, the query optimizer already updates statistics as necessary
to improve the query plan; in some cases you can improve query
performance by using UPDATE STATISTICS or the stored procedure
sp_updatestats to update statistics more frequently than the default
updates.
Updating statistics ensures that queries compile with up-to-date
statistics. However, updating statistics causes queries to recompile.
We recommend not updating statistics too frequently because there is a
performance tradeoff between improving query plans and the time it
takes to recompile queries. The specific tradeoffs depend on your
application. UPDATE STATISTICS can use tempdb to sort the sample of
rows for building statistics.
I ran the following:
USE db_reports;
GO
UPDATE STATISTICS tb_applicationusage;
GO
2 seconds later the update was complete, and my short-term recent queries now run quickly.
Thanks,
Beems

Inefficent Query Plans SQL Server 2008 R2

Good Day,
We experience ongoing issues with our databases for which our internal DBA's are unable to explain.
Using the below query example:
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
Into
#sometemptable
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date >='7/1/2013'
And ActiveBalance > 0
And Discount <> '0' and discount is not null
This query will often run for an hour plus before I end up needing to kill it.
However, if I run the query as follows:
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
Into
#sometemptable
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date Between '7/1/2013' and '12/1/2013' --all of the dates are the first of the month
And ActiveBalance > 0
And Discount <> '0' and discount is not null
Followed by
Insert into #sometemptable
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date Between '1/1/2014' and '6/1/2014' --all of the dates are the first of the month
And ActiveBalance > 0
And Discount <> '0' and discount is not null
I can run the query in less than 10 minutes. The particular tables I'm hitting are updated monthly. Stat updates are run on these tables both Monthly and weekly. Our DBA's, as mentioned before do not understand why the top query takes so much longer than the combination of the smaller queries.
Any ideas? Any suggestions would be greatly appreciated!
Thanks,
Ron
This is just a guess, but when you do Date >= '7/1/2013' sql will analyze how many rows it will approximatly return, and if the rows are greater then some internal threshold it will do a scan instead of a seek, thinking that there is enough data that it needs to return that a a scan will be faster.
When you do the between clause, sql server will do a seek because it knows it will not need to return the majority of rows that, that table has.
I assume that it is doing a table scan when you do the >= search. Once you post the Execution plans we will see for sure.

Resources