Snowflake auto scaling and concurrency - snowflake-cloud-data-platform

I have a medium snowflake warehouse with below conf.
min cluster :1
max cluster : 6
Max concurrency level : 12
I am running 100 parallel queries from 100 sessions ( 1 session 1 query). With 100 queries on moderate data set is taking max 21 seconds to process all the queries. Most of the queries are having more than 90% time as queuing time. Even we have 6 max clusters , snowflake is adding only 2 clusters for all the workload. I am pretty confused as i was expecting all the clusters to be active instead of having more queuing time with 2 clusters. Can you please help me here and let me know you experiences on auto scaling.

Related

SSIS - Size of SSISDB

my SSISDB is writing a large number of entries, especially in [internal].[event_messages] and [internal].[operation_messages].
I have already set the the number of versions to keep and the log retention period to 5 each. After running the maintenance job, selecting the distinct dates in both those tables shows that there are only 6 dates left (including today), as one would expect. Still, the tables I mentioned above have about 6.5 million entries each and a total database size of 35 GB (again, for a retention period of 5 days).
In this particular package, I am using a lot of loops and I suspect that they are causing this rapid expansion.
Does anyone have an idea of how to reduce the number of operational and event messages written by the individual components? Or do you have an idea of what else might be causing this rate of growth? I have packages running on other machines for over a year with a retention period of 150 days and the size of the SSISDB is just about 1 GB.
Thank You!

SQL Server optimization

My application (industrial automation) uses SQL Server 2017 Standard Edition on a Dell T330 server, has the configuration:
Xeon E3-1200 v6
16gb DDR4 UDIMMs
2 x 2tb HD 7200RPM (Raid 1)
In this bank, I am saving the following tables:
Table: tableHistory
Insert Range: Every 2 seconds
410 columns type float
409 columns type int
--
Table: tableHistoryLong
Insert Range: Every 10 minutes
410 columns type float
409 columns type int
--
Table: tableHistoryMotors
Insert Range: Every 2 seconds
328 columns type float
327 columns type int
--
Table: tableHistoryMotorsLong
Insert Range: Every 10 minutes
328 columns type float
327 columns type int
--
Table: tableEnergy
Insert Range: Every 700 milliseconds
220 columns type float
219 columns type int
Note:
When I generate reports / graphs, my application inserts the inclusions in the buffer. Because the system cannot insert and consult at the same time. Because queries are well loaded.
A columns, they are values ​​of current, temperature, level, etc. This information is recorded for one year.
Question
With this level of processing can I have any performance problems?
Do I need better hardware due to high demand?
Can my application break at some point due to the hardware?
Your question may be closed as too broad but I want to elaborate more on the comments and offer additional suggestions.
How much RAM you need for adequate performance depends on the reporting queries. Factors include the number of rows touched, execution plan operators (sort, hash, etc.), number of concurrent queries. More RAM can also improve performance by avoiding IO, especially costly with spinning media.
A reporting workload (large scans) against a 1-2TB database with traditional tables needs fast storage (SSD) and/or more RAM (hundreds of GB) to provide decent performance. The existing hardware is the worst case scenario because data are unlikely to be cached with only 16GB RAM and a singe spindle can only read about 150MB per second. Based on my rough calculation of the schema in your question, a monthly summary query of tblHistory will take about a minute just to scan 10 GB of data (assuming a clustered index on a date column). Query duration will increase with the number of concurrent queries such that it would take at least 5 minutes per query with 5 concurrent users running the same query due to disk bandwidth limitations. SSD storage can sustain multiple GB per second so, with the same query and RAM, a data transfer time for the query above will take under 5 seconds.
A columnstore (e.g. a clustered columnstore index) as suggested by #ConorCunninghamMSFT will reduce the amount of data transferred from storage greatly because only data for the columns specified in the query are read and inherent columnstore compression
will reduce both the size of data on disk and the amount transferred from disk. The compression savings will depend much on the actual column values but I'd expect 50 to 90 percent less space compared to a rowstore table.
Reporting queries against measurement data are likely to specify date range criteria so partitioning the columnstore by date will limit scans to the specified date range without a traditional b-tree index. Partitioning will also also facilitate purging for the 12-month retention criteria with sliding window partition maintenenace (partition TRUNCATE, MERGE, SPLIT) and thereby greatly improve performance of the process compared to a delete query.

cost vs no.of queries snowflakes

Say I have warehouse A with 5 queries running between 1PM to 2PM and it cost me X credits.If I increase the no.of queries from 5 to 10 in the same time window does that cost me X+X=2X credits?(provided that all 10 are not same queries).
Load on the warehouse
Credits used for the warehouse during the same time frame.
Details of credit used form WAREHOUSE_METERING_HISTORY
To reduce credit you have to use WH effectively, if 2X queries are getting executed with in the same time-frame using same Warehouse , without increasing WH size then you will be charged the same amount. # of executions will not incur cost, cost will incur for WH up-time
Even if your warehouse can bear the load of nX queries within the same time frame you should use that. These are some common consideration for optimized credit usage
Grouping Queries together
Maximum utilization of WH up time
Minimizing WH Idle time

sqlserver 2012 CPU Duration variation for single clustered index seek

We are running a daily batch and see sometimes factor 20 runtime diffenencies.
Analyzing a trace which recorded fast and slow performance timeframes I isolated a select statement returning a single row from a clustered index which logs a duration of 1101 micos (3 logical Reads) in the "fast" timeframe.
A few minutes later the same select with the same plan lasted 28'275 micros (3 logical reads).
Both timeframes (fast/slow) are in prework time and there is almost no other activity on the server.
It is a AlwaysOn cluster running SQLServer 2012 with CPU-usage always below 30% and due to lots of RAM low IO activity.
To us the trace does not reveal a reason for the long duration. Any suggestions what we could trace for to gain more insight?
Thanks
Juerg
Addition:
Added tracing for some of the action and found another strange thing. The app is requesting the same data from the same table with different PK's with dynamic SQL commands (select * from t1 where OID='...'). It does it 4 times in a row and the exec plan is the same (1 index seek and 1 Key Lookup) for all 4 selects. Each select triggeres 8 locical reads. 3 out of the 4 selects log 0 ms CPU time in the trace and 1 logs 15 ms?
Am I right that even a physical read (can't see that in the trace but we got lots of RAM and I doubt that a physical read happens) should not increase the CPU count? What could cause that counter to be so high in comparison to the other reads?

Solr Architecture/Performance

Number of rows in SQL Table (Indexed till now using Solr): 1 million
Total Size of Data in the table: 4GB
Total Index Size: 3.5 GB
Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing
What is the best practices with respect to distributing the index? What I mean to say here is when should I distribute and what is the magic number that I can have for index size per instance?
For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to index for me. So for 20 million roughly it would take 60 -70 hrs. That would be too much.
What would be the best distributed architecture for my case? It will be great if people may share their best practices and experience.

Resources