Say I have warehouse A with 5 queries running between 1PM to 2PM and it cost me X credits.If I increase the no.of queries from 5 to 10 in the same time window does that cost me X+X=2X credits?(provided that all 10 are not same queries).
Load on the warehouse
Credits used for the warehouse during the same time frame.
Details of credit used form WAREHOUSE_METERING_HISTORY
To reduce credit you have to use WH effectively, if 2X queries are getting executed with in the same time-frame using same Warehouse , without increasing WH size then you will be charged the same amount. # of executions will not incur cost, cost will incur for WH up-time
Even if your warehouse can bear the load of nX queries within the same time frame you should use that. These are some common consideration for optimized credit usage
Grouping Queries together
Maximum utilization of WH up time
Minimizing WH Idle time
Related
I am trying to understand "latency" issue with Account Usage Views.
Does the latency, let's say for Query History mentioned to be 45 min, mean it might take 45 min for a query to pull result out of Account Usage view or does it mean it might take time for data to be available in Account Usage view?
When I query Account Usage in a trial account, query doesnt take much time on Account Usage view and also Account Usage view shows latest sql details in Query History so I am not able to understand what latency denote.
Another question is if latency means the amount of time SQL will take to pull result I assume it will keep the Warehouse in running state increasing the cost
Data latency
Due to the process of extracting the data from Snowflake’s internal metadata store, the account usage views have some natural latency:
For most of the views, the latency is 2 hours (120 minutes).
For the remaining views, the latency varies between 45 minutes and 3 hours.
For details, see the list of views for each schema (in this topic). Also, note that these are all maximum time lengths; the actual latency for a given view when the view is queried may be less.
"Does the latency, let's say for Query History mentioned to be 45 min, mean it might take 45 min for a query to pull result out of Account Usage view or does it mean it might take time for data to be available in Account Usage view?"
The terms latency refers to the time until the data will be available in Account Usage view.
It does not mean that query SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.<some_view> takes 45 minutes to execute.
For TDengine database, when there are a lot of tables (for example, 5000000) in one database, the start time usually takes over 3 minutes. If the number of tables less than 1000000, the start time takes less than 1 minute which is more acceptable.
Is there any way to reduce the database start time when there a huge number of tables?
I am looking at the queries performed against my warehouse and finding the credit calculation I'm using doesn't add up to what's being shown in snowflake. As I understand it, it is supposed to use credits per second of query time with a minimum of 60s. So if a query runs for 5s it would use 60s worth of credits, but if a query runs for 61s it will use 61s worth of credits.
Looking at the query history, limiting only to queries performed on my warehouse, I am only seeing 5 queries for the hour in question (12).
These queries copy their results into an S3 bucket in my AWS account.
If I take the starts and ends of each of these queries and chart time, I am only seeing a total of 455 seconds of query time. With the X-Small warehouse that I'm using (1 credit per hour), that should be only 0.126 credits used for that hour.
But I am seeing 0.66 credits used here:
What am I missing about snowflake credit usage? Why does it appear that I am using more credits than I should?
Moving answer from comments to an actual answer (for completeness):
Snowflake costs don't reflect query runtimes, but warehouses being run.
AUTO_SUSPEND can be set to 60 seconds (or less) to more closely match the duration of queries.
You can refer to the official Snowflake documentation for more details:
Virtual Warehouse Credit Usage
How are Credits Charged for Warehouses?
My application (industrial automation) uses SQL Server 2017 Standard Edition on a Dell T330 server, has the configuration:
Xeon E3-1200 v6
16gb DDR4 UDIMMs
2 x 2tb HD 7200RPM (Raid 1)
In this bank, I am saving the following tables:
Table: tableHistory
Insert Range: Every 2 seconds
410 columns type float
409 columns type int
--
Table: tableHistoryLong
Insert Range: Every 10 minutes
410 columns type float
409 columns type int
--
Table: tableHistoryMotors
Insert Range: Every 2 seconds
328 columns type float
327 columns type int
--
Table: tableHistoryMotorsLong
Insert Range: Every 10 minutes
328 columns type float
327 columns type int
--
Table: tableEnergy
Insert Range: Every 700 milliseconds
220 columns type float
219 columns type int
Note:
When I generate reports / graphs, my application inserts the inclusions in the buffer. Because the system cannot insert and consult at the same time. Because queries are well loaded.
A columns, they are values of current, temperature, level, etc. This information is recorded for one year.
Question
With this level of processing can I have any performance problems?
Do I need better hardware due to high demand?
Can my application break at some point due to the hardware?
Your question may be closed as too broad but I want to elaborate more on the comments and offer additional suggestions.
How much RAM you need for adequate performance depends on the reporting queries. Factors include the number of rows touched, execution plan operators (sort, hash, etc.), number of concurrent queries. More RAM can also improve performance by avoiding IO, especially costly with spinning media.
A reporting workload (large scans) against a 1-2TB database with traditional tables needs fast storage (SSD) and/or more RAM (hundreds of GB) to provide decent performance. The existing hardware is the worst case scenario because data are unlikely to be cached with only 16GB RAM and a singe spindle can only read about 150MB per second. Based on my rough calculation of the schema in your question, a monthly summary query of tblHistory will take about a minute just to scan 10 GB of data (assuming a clustered index on a date column). Query duration will increase with the number of concurrent queries such that it would take at least 5 minutes per query with 5 concurrent users running the same query due to disk bandwidth limitations. SSD storage can sustain multiple GB per second so, with the same query and RAM, a data transfer time for the query above will take under 5 seconds.
A columnstore (e.g. a clustered columnstore index) as suggested by #ConorCunninghamMSFT will reduce the amount of data transferred from storage greatly because only data for the columns specified in the query are read and inherent columnstore compression
will reduce both the size of data on disk and the amount transferred from disk. The compression savings will depend much on the actual column values but I'd expect 50 to 90 percent less space compared to a rowstore table.
Reporting queries against measurement data are likely to specify date range criteria so partitioning the columnstore by date will limit scans to the specified date range without a traditional b-tree index. Partitioning will also also facilitate purging for the 12-month retention criteria with sliding window partition maintenenace (partition TRUNCATE, MERGE, SPLIT) and thereby greatly improve performance of the process compared to a delete query.
Have been played around with the Snowflake Query Profile Interface but missing information about the parallelism in query execution. Using a Large or XLarge Warehouse it is still only using two servers to execute the query. Having an XLarge Warehouse a big sort could be divided in 16 parallel execution threads to fully exploit my Warehouse and credits. Or?
Example: Having a Medium Warehouse as:
Medium Warehouse => 4 servers
Executing the following query:
select
sum(o_totalprice) "order total",
count(*) "number of orders",
c.c_name "customer"
from
orders o inner join customer c on c.c_custkey = o.o_custkey
where
c.c_nationkey in (2,7,22)
group by
c.c_name
Gives the following Query Plan:
Query Plan
In the execution details I cannot see anything about the participating servers:
enter image description here
Best Regards
Jan Isaksson
In an ideal situation snowflake will try to split your query and let every core of the warehouse to process a piece of the query. For example, if you have a 2XL warehouse, you have 32x8 = 256 cores(each node in a warehouse has 8 cores). So, if a query is submitted, in an ideal situation snowflake will try to divide it into 256 parts and have each core process a part.
In reality, it may not be possible to parallize to that extent and that is because either the query itself cannot be broken down like that(for example, if you are trying to calculate let's say a median) or if the data itself is preventing it to parallelize(for example if you are trying to run a window function on a column which is skewed) it to that extent.
Hence, it is not always true that if you move to a bigger warehouse your query performance will improve linearly.
I tested your query starting with smallest compute size and the up. The linear scaling (more compute resource results in improved performance) stops around medium size, at which point there is no added benefit of performance improvement. This indicates your query is not big enough to take advantage of more compute resource and size s is good enough, especially considering cost optimization.