Underutilized warehouse - snowflake-cloud-data-platform

I have a requirement to know about snowflake warehouse usage by hour, day and week.
Want to know the number of queries queued and running by hour, day and week to know the utilization of warehouse.
Can you please help on the same with query and tables.

Here's a great blog post which will help:
https://www.snowflake.com/blog/understanding-snowflake-utilization-warehouse-profiling/
This page illustrates how you might better understand (visually) warehouse utilization:
https://docs.snowflake.net/manuals/user-guide/warehouses-load-monitoring.html
Snowflake provides functions and views to assist you monitor warehouse usage, as outlined in the blog post, and documented in these links:
https://docs.snowflake.net/manuals/sql-reference/functions/warehouse_load_history.html
https://docs.snowflake.net/manuals/sql-reference/account-usage/warehouse_load_history.html
https://docs.snowflake.net/manuals/sql-reference/account-usage/warehouse_metering_history.html
I hope this helps...Rich

Related

Should we plan purchase module or inventory first for data warehouse design

I have been developing a data warehouse for Analytical needs. As I am a new learner, I have started first working on my own work area of manufacturing related tables and made reports for these. I have been following Kimball design and so adding step by step new processes.
Now, I would like to start working on Inventory or purchase transactions.
I am interested to know, what is the recommended flow of processes for a good data warehouse design, for example should I first work on purchase transactions or Inventory transactions and so on for all other processes. As of now, I would like to consider it for manufacturing organization. Is there anywhere documented such a practice.
Thank you in advance for the help.

In Snowflake, how to determine "Cloud Services" credits vs "Compute" credits?

In Snowflake's web UI, go to 'Account' --> 'Billing & Usage'. Select one of the Warehouses on the left. Credit stats will appear. Drill into one by clicking on the magnifying glass. A new window will pop up with a stacked bar chart of credits used. The stacks consist of "Cloud Services" credits and "Compute" credits. I would like to know the detailed breakdown of items under each type of credit.
My goal is to increase performance and efficiency in our usage. But it's difficult to pinpoint some areas in need of improvement when Snowflake just reports "Compute" credits without any type of breakdown.
I have found the following pages:
https://docs.snowflake.com/en/user-guide/credits.html#
https://docs.snowflake.com/en/user-guide/admin-monitoring-usage.html#
One of them has some detail on "Cloud Services" credits, but I would like a little more detail (if it is available). I cannot find anything that lists what falls under "Compute" credits. In addition, I have familiarized myself a little with the SNOWFLAKE.ACCOUNT_USAGE schema. QUERY_HISTORY gives me great granular detail of "Cloud Services" usage. But again, I haven't found anything that breaks down "Compute" usage.
Anything I missed, or a point in the right direction would be appreciated. Thank you in advance.
Snowflake Compute credits are based on warehouse usage and is determined by the size of the warehouse and the amount of time that the warehouse is up and running. Your best bet for detailed information on this is likely the following account_usage view:
Warehouse Metering History:
https://docs.snowflake.com/en/sql-reference/account-usage/warehouse_metering_history.html
This view provides you hourly credit consumption by warehouse. It also includes the cloud services credits that for that hour by warehouse, as well.
Note that while you can get cloud services credits at the query_history level, that is not available for compute credits, since compute credits are based on the time the warehouse is running, not based on each query that is executed (for example, 5 queries could be running at the same time on the same warehouse, but you are only charged for the time the warehouse is running).
Hope this helps. If you need clarification, add a comment below, and I can help answer further.

data model for performance monitoring for Tableau Server

I have a question in regards to some performance monitoring.
I am going to connect to the Postgres database and what I want to do is extract the relevant information from tableau server database to my very own database.
I am following a document at the moment to perform the relevant steps needed to retrieve the performance information from Postgres, but what I really require to do is set up a data model for our very own database.
I’m not a strong DBA so may require help in designing the data model but the requirement is:
We want a model in place so that we can see how long workbooks take to load and if any of them take let’s say longer than 5 seconds, we are alerted of this so we can go in and investigate.
My current idea for the data model in very basic terms is having the following tables:
Users – Projects – Workbooks – Views – Performance
Virtually we have our users who access various projects that contain their very own workbooks. The views table is simply for workbook view so that we can see how many times a workbook has been viewed and when. Finally performance table is required for the load time.
This is a very basic description we require but my question is simply is there anyone who has knowledge of tableau and data models to help design a very basic model and schema for this? Will need it salable so that it can perform for as many tableau servers as it can.
Thank you very much,
I've found an article from the blog of a monitoring tool that I like and maybe it could help you with your monitoring. I'm not an expert in PostgreSQL but is worth to take a look:
http://blog.pandorafms.org/how-to-monitor-postgress/
Hope this can help!

advice on appropriate database for click logging reporting

I am about to build a service that logs clicks and transactions from an e-commerce website. I expect to log millions of clicks every month.
I will use this to run reports to evaluate marketing efforts and site usage (similar to Google Analytics*). I need to be able to make queries, such as best selling product, most clicked category, average margin, etc.
*As some actions occur at later times and offline GA doesn´t fullfill all our needs.
The reporting system will not have a heady load and it will only be used internally.
My plan is to place loggable actions in a que and have a separate system store these to a database.
My question is what database I should use for this. Due to corporate IT-policy I do only have these options; SimpleDB (AWS), DynamoDB (AWS) or MS SQL/My SQL
Thanks in advance!
Best regards,
Fredrik
Have you checked this excelent Amazon documentation page ? http://aws.amazon.com/running_databases/ It helps to pick the best database from their products.
From my experience, I would advise that you do not use DynamoDB for this purpose. There is no real select equivalent and you will have hard time modeling your data. It is feasible, but not trivial.
On the other hand, SimpleDB provides a select operation that would considerably simplify the model. Nonetheless, it is advised against volumes > 10GB: http://aws.amazon.com/running_databases/
For the last resort option, RDS, I think you can do pretty much everything with it.

Improving performance with a flat table and a backgroung sql job

I'm running a classifieds website that has ads and comments on it. As the traffic has grown to a considerable amount and the number of ads in the system have reached over 1.5 million out of which nearly 250K are active ads.
Now the problem is that the system has been designed to be very dynamic in terms of the category of the ads and the properties each kind of ad can have based on it category or sub category, therefore to display an ad I have to join nearly 4 to 5 tables.
To solve this issue I have created a flat table (conceptually what I call a publishing table) and populate that table with an SQL Job every 3 to 4 minutes. Now for web requests I query that table to show ad listings or details.
I also have implemented a data cache of around 1 minute for each unique url combination for ad listings and for each ad detail.
I do the same thing for comments on ads (i.e. cache the comments and as the comments are hierarchical, I have used a flat table publishing model for them also, again populated with an SQL Job)
My questions are as follows:
Is the publishing model with a backgroung sql job a good design approach?
What approach would you take or people take for scenarios like this?
How does a website like facebook show comments realtime with millions of users, keeping sure that they do not lose any comments data by only keeping it in the cache and doing batch updates ?
Starting at the end:
3.How does a website like facebook show comments realtime with millions of users, keeping sure
that they do not lose any comments data by only keeping it in the cache and doing batch updates ?
Two things:
Smarter programming than you. They can put a larget etam on solving this problem for months.
Ignorance. They really dont care too muich about a cache being a little outdated. Noone will really realize.
Hardware ;) More and more powerful servers than yours.
That said, your apoproach sounds sensible.

Resources