Does Snowflake keep track of the warehouse resizing values? - snowflake-cloud-data-platform

I'm looking at WAREHOUSE_EVENTS_HISTORY, but I can't find a way to track the warehouse resizing values.
How can I get this data out of Snowflake?
(based on a customer question)

To track the resizing values you can join WAREHOUSE_EVENTS_HISTORY with QUERY_HISTORY on query_id, and parse the sql_text:
use role accountadmin
;
select timestamp, event_state, a.user_name, a.role_name,
upper(regexp_substr(b.query_text, '(XSMALL|SMALL|MEDIUM|LARGE|XLARGE|XXLARGE|XXXLARGE|X4LARGE|X5LARGE|X6LARGE)', 1, 1, 'i')) as wh_size,
upper(regexp_substr(b.query_text, '(STANDARD|ECONOMY)', 1, 1, 'i')) as scaling_policy
from snowflake.account_usage.WAREHOUSE_EVENTS_HISTORY a
join snowflake.account_usage.QUERY_HISTORY b
on a.query_id = b.query_id
where a.event_name = 'ALTER_WAREHOUSE'
and a.timestamp > '2021-10-01'
https://docs.snowflake.com/en/sql-reference/account-usage/warehouse_events_history.html
https://docs.snowflake.com/en/sql-reference/account-usage/query_history.html
If you are thinking about analyzing costs, check:
https://medium.com/opendoor-labs/analyze-snowflake-costs-570b7be953db

Related

SQL Server trigger to track annual revenue

I have a simple database with 4 tables:
Customer (cusId)
Newspaper (papId)
SubCost (subId)
Subscription (cusId, papId, subId)
Newspaper has a column to track number of subscribers which is updated via a trigger on the Subscription table. It also has a column to track annual revenue which should be based on the number of subscribers and the cost associated with the subscription (subId).
I am looking for a trigger to track annual revenue. There are 3 subscription types (subId) with differing weekly costs and a paper can have more than one type of subscription so it can't just be (cost * 52 * numSubs).
Can you help me with this logic?
Your best bet is not using such a column at all. Instead use a view which computes the result, and index it if necessary
CREATE OR ALTER VIEW vTotalSubs
WITH SCHEMABINDING AS
SELECT
n.papid,
TotalRevenue = SUM(sc.Cost * 52),
TotalSubscriptions = COUNT_BIG(*) -- you MUST have this column here if aggregating with an index
FROM dbo.Newspaper n
JOIN dbo.Subscription s ON s.papid = n.papid
JOIN dbo.SubCost sc ON sc.subid = s.subid
GROUP BY
n.papid;
GO
CREATE UNIQUE CLUSTERED INDEX CX_vTotalSubs ON vTotalSubs (papid);
If you decide to index the view, be aware there are many restrictions to indexed views, in particular:
Only INNER JOIN is allowed, no other join types, no subqueries
Must schema-bind, and specify schema on all tables.
If aggregating, you must have COUNT_BIG(*), and the only other aggregation allowed is SUM
Make sure to add the WITH (NOEXPAND) hint when querying, otherwise there may be performance impacts
The server will automatically maintain the index, you do not need to update it.

find and get data in large amount of SQL table

I have a simple DB table with ONLY 5 columns with no primary key having 7 billion+(7,50,01,771) data. yes, you read it correctly. it has one cluster index.
DB table columns
Cluster index
if I write a simple select query to get data, it is taking 7-8 minutes to return data. now, you get my next question. what are the techniques that I can apply to this DB table? So that I can get data in time.
in the actual scenario, where I am using this table have join with 2 temp tables that have WHERE clause and filtered data. Please find below my query for reference.
SELECT dt.ZipFrom, dt.ZipTo, dt.Total_time, sz.storelocation, sz.AcctShip, sz.Licensee,sz.Entity from #Zips z INNER join DriveTime_ZIPtoZIP dt on zipFrom = z.zip INNER join #storeZips sz on ZipTo = sz.zip order by z.zip desc, total_time asc
Thanks
You can index according to the where conditions in the query. However, this comes at a cost: Storage.
Order by statement is also important. If you have to use order by in your query, you can also index accordingly.
But do not forget, the cost of indexing ...

Updating redundant/denormalized data automatically in SQL Server

Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

Database design - Friend activities

Currently I am designing a small twitter/facebook kind of system, where in a user should be able to see his friends latest activities.
I am using ASP.NET with MySQL database.
My Friendships table is as follows:
|Friendshipid|friend1|Friend2|confirmed|
Friend1 and Friend2 in the above table are userids.
User activities table design following:
|activityId|userid|activity|Dated|
Now, I am looking for best way to query the latest 50 friend activities for a user.
For example, let's say if Tom logs into the system, he should be able to see latest 50 activities among all his friends.
Any pointers on the best practices, a query or any information is appreciated.
It largely depends on what data is stored in the Friendships table. For example, what order are the Friend1 and Friend2 fields stored in? If, for the fields (friend1, friend2) the tuple (1, 2) exists, will (2, 1) exist also?
If this is not the case, then this should work:
SELECT activities.*
FROM Activities
INNER JOIN Friendships ON userid = friend1 OR userid = friend2
WHERE activity.userid != [my own id]
AND confirmed = TRUE
LIMIT 50;
If you have database performance concern, you may redefine the friendship table as following:
friendshipid, userid, friendid, confirmed
When you query the latest 50 activities, the SQL would be:
SELECT act.*
FROM Activities AS act
INNER JOIN
Friendships AS fs
ON fs.friendid = act.userid
AND fs.user_id = 'logon_user_id'
AND confirmed = TRUE
ORDER BY act.dated DESC
LIMIT 50;
And if there is a index on Friendships(userid) column, it would give the database the chance to optimize the query.
The friendship table redefined needs to create two tuples when a friendship occur, but it still obey the rule of business, and, has performance benefit when you need it.

HELP with sql query involving two tables and a max date

I have two tables notifications and mailmessages.
Notifications table
- NotifyTime
- NotifyNumber
- AccountNumber
MailMessages table
- id
- messageSubject
- MessageNumber
- AccountNumber
My goal is to create a single sql query to retrieve distinct rows from mailmessages WHERE the accountnumber is a specific number AND the notifynumber=messagenumber AND ONLY the most recent notifytime from the notifications table where the accountnumbers match in both tables.
I am using sqlexpress2008 as a back-end to an asp.net page. This query should return distinct messages for an account with only the most recent date from the notifications table.
Please help! I'll buy you a beer!!!
Try this...
SELECT MM.MaxNotifyTime, Notify.MaxNotifyTime
FROM MailMessages MM
INNER JOIN (SELECT Max(NotifyTime) MaxNotifyTime, AccountNumber
FROM Notifications
GROUP BY AccountNumber) Notify ON (MM.AccountNumber=Notify.AccountNumber)
WHERE (MM.AccountNumber=1)
SELECT MM.MessageNumber, MAX(N.NotifyTime) MaxTime
FROM MailMessages MM
INNER JOIN Notifications N
ON MM.AccountNumber = N.AccountNumber AND MM.MessageNumber = N.NotifyNumber
WHERE MM.AccountNumber = 1
GROUP BY MM.MessageNumber
This limits to the given AccountNumber (=1) and outputs every associated MessageNumber together with the date of the most recent corresponding entry in Notifications.

Resources