In Snowflake, how to determine "Cloud Services" credits vs "Compute" credits?

In Snowflake, how to determine "Cloud Services" credits vs "Compute" credits? - snowflake-cloud-data-platform

In Snowflake's web UI, go to 'Account' --> 'Billing & Usage'. Select one of the Warehouses on the left. Credit stats will appear. Drill into one by clicking on the magnifying glass. A new window will pop up with a stacked bar chart of credits used. The stacks consist of "Cloud Services" credits and "Compute" credits. I would like to know the detailed breakdown of items under each type of credit.
My goal is to increase performance and efficiency in our usage. But it's difficult to pinpoint some areas in need of improvement when Snowflake just reports "Compute" credits without any type of breakdown.
I have found the following pages:
https://docs.snowflake.com/en/user-guide/credits.html#
https://docs.snowflake.com/en/user-guide/admin-monitoring-usage.html#
One of them has some detail on "Cloud Services" credits, but I would like a little more detail (if it is available). I cannot find anything that lists what falls under "Compute" credits. In addition, I have familiarized myself a little with the SNOWFLAKE.ACCOUNT_USAGE schema. QUERY_HISTORY gives me great granular detail of "Cloud Services" usage. But again, I haven't found anything that breaks down "Compute" usage.
Anything I missed, or a point in the right direction would be appreciated. Thank you in advance.

Snowflake Compute credits are based on warehouse usage and is determined by the size of the warehouse and the amount of time that the warehouse is up and running. Your best bet for detailed information on this is likely the following account_usage view:
Warehouse Metering History:
https://docs.snowflake.com/en/sql-reference/account-usage/warehouse_metering_history.html
This view provides you hourly credit consumption by warehouse. It also includes the cloud services credits that for that hour by warehouse, as well.
Note that while you can get cloud services credits at the query_history level, that is not available for compute credits, since compute credits are based on the time the warehouse is running, not based on each query that is executed (for example, 5 queries could be running at the same time on the same warehouse, but you are only charged for the time the warehouse is running).
Hope this helps. If you need clarification, add a comment below, and I can help answer further.

Related

Viewers report on Google data studio (who saw my report)

We create approximately 60 dash boards for different stakeholders in our organisation for monitoring of various activities on #Google_data_studio. But we do not come to know does stake holders watches our analytical dashboards. Do we have any provision for it? If yes please let us know. We already tried this with Google Analytics but it shows only number of viewers and not exactly who have viewed the report.
If any one has tried please let us know. We are expecting following fields in viewers reports:
Name of viewers
At what time and on what day he saw the report ( to check frequency)
how much time he spent on it.

You can measure report usage by adding a Google Analytics Id to your report. If you have multiple reports, you can use the same id across all of them. The link has the step by step process for this. However, this won't show the name/identity of the report viewer.

What DB/service architecture to use in order to query and store metric data?

I'm new to systems development and was wondering if someone more experienced than I could help me figure out some issues about database, web services, and overall architecture.
I have a web scraper that's supposed to run daily. It will collect, scrub, and aggregate data on local businesses from multiple publicly available government data. This data goes to a Postgres DB.
The user will then have an admin dashboard where they can see some metrics and trends. What I don't know is if this dashboard should query the DB every time the user loads the dashboard.
I imagine this is not the wisest approach since it would overload and slow down the DB with multiple JOIN, SUM, COUNT etc. I believe it would be best to compile these metrics overnight and store it somewhere? Or hourly?
I was doing some research and came across these "Analytical Databases". Is that what I should use? This similar question seems to have solved the puzzle, especially #samxli's comment on the accepted answer.
I could really use some direction-pointing here. How is analytics commonly handled in production? Thank you so much in advance! :thumbs-up:
Solution details:
NodeJS web scraper with CAPTCHA bypassing collects public data daily
Data from multiple sources is scubbed, aggregated and saved to a Postgres DB
Data contains public information about local businesses - see below
A dashboard shows historical data (time series), metrics, and trends
Sample record:
{
trade_name: "ACME Inc.",
legal_name: "Watchmen Hero Services Incorporated"
active: true,
foundation_date: "2018-11-23",
sector: "services",
main_activity: { id: 12318, name: "Law enforcement" },
secondary_activities: [],
address: {} // standard address object
location: { lat: -23.2319, long: 42.1212 },
...
}
Sample metrics:
Total number of active and inactive companies over time per sector and activity
Estimated tax revenue over time per district and activity
Top N most common activities per city district

I can see a few options. I agree with you in that, at scale, you want to separate reading and writing so that analytics doesn't impact your system performance.
You might want to look into replication - https://www.brianstorti.com/replication/. You can read from a "read replica" and get a near-realtime view of the data, but without a massive disruptive impact on write performance.
Alternatively, if you want to do some more work and get something that can work well at scale, dig deeper into your findings on analytical databases (OLAP) and look at building out a Star schema (https://en.wikipedia.org/wiki/Star_schema). You can put an ETL (Extract, Transform, Load) process in place to pull data from your transactional database into your analytics database in a format that can be much easier to aggregate and work with. I've worked on something similar with hundreds of data sources synced in 30-minute batches into a data warehouse. This might be overkill if you only have a single data source though.
Lastly, instead of Postgress, if you're primarily dealing with time series data and metrics, also consider the Elastic Stack (https://hackernoon.com/elastic-stack-a-brief-introduction-794bc7ff7d4f).
If you don't go the Elastic route, also consider some BI (business intelligence) tools like PowerBI to build your dashboards, rather than reinventing the wheel.

advice on appropriate database for click logging reporting

I am about to build a service that logs clicks and transactions from an e-commerce website. I expect to log millions of clicks every month.
I will use this to run reports to evaluate marketing efforts and site usage (similar to Google Analytics*). I need to be able to make queries, such as best selling product, most clicked category, average margin, etc.
*As some actions occur at later times and offline GA doesn´t fullfill all our needs.
The reporting system will not have a heady load and it will only be used internally.
My plan is to place loggable actions in a que and have a separate system store these to a database.
My question is what database I should use for this. Due to corporate IT-policy I do only have these options; SimpleDB (AWS), DynamoDB (AWS) or MS SQL/My SQL
Thanks in advance!
Best regards,
Fredrik

Have you checked this excelent Amazon documentation page ? http://aws.amazon.com/running_databases/ It helps to pick the best database from their products.
From my experience, I would advise that you do not use DynamoDB for this purpose. There is no real select equivalent and you will have hard time modeling your data. It is feasible, but not trivial.
On the other hand, SimpleDB provides a select operation that would considerably simplify the model. Nonetheless, it is advised against volumes > 10GB: http://aws.amazon.com/running_databases/
For the last resort option, RDS, I think you can do pretty much everything with it.

Best Design for creating Historic Reports on GAE

My App requires Daily reports based on various user activities. My current design does not sum the daily totals in database, which means I must compute them everytime.
For example A report that shows Top 100 users based on the number of submissions they have made on a given day.
For such a report If I have 50,000 users, what is the best way to create daily report?
How to create monthly and yearly report with such data?
If this is not a good design, then how to deal with such design decision when the metrics of the report are not clear during db design and by the time it is clear we already have huge data with limited parameters (fields).
Please advice.

Ideally, I would advise you to create your data model in such a way that all of the items that needed to be reported could be precomputed in order to minimize the amount of querying that had to be done on the database. It sounds like you might not be able to do that, and in any case, it is an approach that can be brittle and resistant-to-change.
With the release of the 1.3.1 version of the SDK, you now have access to query cursors, and that makes it a good deal easier to deal with generating reports based on a large number of user. You could use appengine cron jobs to put a job on a task queue to compute the numbers for the report.
Since any given invocation of your task is unlikely to complete in the time that AppEngine allows it to run, you'll have to pass the query cursor from one instance to the next until it finishes.
This approach allows you to adapt to changes to your database and reporting needs, as you can record that Task that computes the report values fairly easily.

Referrals DB schema

I'm coding a new {monthly|yearly} paid site with the now typical "referral" system: when a new user signs up, they can specify the {username|referral code} of other user (this can be detected automatically if they came through a special URL), which will cause the referrer to earn a percentage of anything the new user pays.
Before reinventing the wheel, I'd like to know if any of you have experience with storing this kind of data in a relational DB. Currently I'm using MySQL, but I believe any good solution should be easily adapted to any RDBMS, right?
I'm looking to support the following features:
Online billing system - once each invoice is paid, earnings for referrals are calculated and they will be able to cash-out. This includes, of course, having the possibility of browsing invoices / payments online.
Paid options vary - they are different in nature and in costs (which will vary sometime), so commissions should be calculated based on each final invoice.
Keeping track of referrals (relationship between users, date in which it was referred, and any other useful information - any ideas?)
A simple way to access historical referring data (how much have been paid) or accrued commissions.
In the future, I might offer to exchange accrued cash for subscription renewal (covering the whole of the new subscription or just a part of it, having to pay the difference if needed)
Multiple levels - I'm thinking of paying something around 10% of direct referred earnings + 2% the next level, but this may change in the future (add more levels, change percentages), so I should be able to store historical data.
Note that I'm not planning to use this in any other project, so I'm not worried about it being "plug and play".
Have you done any work with similar requirements? If so, how did you handle all this stuff? Would you recommend any particular DB schema? Why?
Is there anything I'm missing that would help making this a more flexible implementation?

Rather marvellously, there's a library of database schemas. Although I can't see something specific to referrals, there may be something related. At least (hopefully) you should be able to get some ideas.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight