Why my SnappyData cluster faced with slow query about once a day - database

When my cluster runs for a certain time(maybe a day, maybe two days), some queries may become very slow, about 2~10min to finish, when this happens, I need to restart the whole cluster, and the query become normal, but after some time, very slow queries happen again

The query response time depends on multiple factor including
1. Table size, if table size grows with time then response time will also increase
2. If it is open source version then time spent in GC pauses, which in turn will depend on number of objects/grabage present in the JVM Heap
3. Number of Concurrent queries being run
4. Amount of data overflown to disk
You will need to describe in detail your usage pattern of snappydata. Only then It would be possible to characterise the issue.
Some of the questions that should be answered are like
1. What is cluster size?
2. What are the table sizes?
3. Are writes happening continuously on the tables or only queries are getting executed?
You can engage us at slack channel to provide us informations related to your clusters.

Related

Query compilation and provisioning times

What does it mean there is a longer time for COMPILATION_TIME, QUEUED_PROVISIONING_TIME or both more than usual?
I have a query runs every couple of minutes and it usually takes less than 200 milliseconds for compilation and 0 for provisioning. There are 2 instances in the last couple of days the values are more than 4000 for compilation and more than 100000 for provisioning.
Is that mean warehouse was being resumed and there was a hiccup?
COMPILATION_TIME:
The SQL is parsed and simplified, and the tables meta data is loaded. Thus a compile for select a,b,c from table_name will be fractally faster than select * from table_name because the meta data is not needed from every partition to know the final shape.
Super fragmented tables, can give poor compile performance as there is more meta data to load. Fragmentation comes from many small writes/deletes/updates.
Doing very large INSERT statements can give horrible compile performance. We did a lift-and-shift and did all data loading via INSERT, just avoid..
PRIOVISIONING_TIME is the amount of time to setup the hardware, this occurs for two main reasons ,you are turning on 3X, 4X, 5X, 6X servers and it can take minutes just to allocate those volume of servers.
Or there is failure, sometime around releases there can be a little instability, where a query fails on the "new" release, and query is rolled back to older instances, which you would see in the profile as 1, 1001. But sometimes there has been problems in the provisioning infrastructure (I not seen it for a few years, but am not monitoring for it presently).
But I would think you will mostly see this on a on going basis for the first reason.
The compilation process involves query parsing, semantic checks, query rewrite components, reading object metadata, table pruning, evaluating certain heuristics such as filter push-downs, plan generations based upon the cost-based optimization, etc., which totally accounts for the COMPILATION_TIME.
QUEUED_PROVISIONING_TIME refers to Time (in milliseconds) spent in the warehouse queue, waiting for the warehouse compute resources to provision, due to warehouse creation, resume, or resize.
https://docs.snowflake.com/en/sql-reference/functions/query_history.html
To understand the reason behind the query taking long time recently in detail, the query ID needs to be analysed. You can raise a support case to Snowflake support with the problematic query ID to have the details checked.

Snowflake multi-cluster warehouse performance vs single warehouse with large warehouse size

I am very new to Snowflake and while working with snowflake I had conflict between the below 2 options.
Single warehouse with size X-Large (16 credits / hour)
Multi-cluster (with max clusters=2 & min clusters=2) with size Large (8 credits / hour)
Considering the above 2 options
Is there any advantage I can get by choosing 2nd option in terms of performance?
Note: I know the advantages of multi-cluster over a single warehouse. Please share your answer for this specific scenario (when min = max).
So the things that happen in running a query are.
belong I am going to just use single to mean the single instance and 'multi` to mean the multi instance cluster, of which when we run a query it is only ever on one instance.
Reading\Writing IO from your storage layer:
Here a single has twice the IO over the multi thus if your query is IO saturated the single is the better choice.
Parallel steps:
So if you have a GROUP BY over a high cardinality columns, both the single and multi should be equally good. If you have a low cardinality but billions of rows, the smaller instance might give better results as those complex steps cannot be broken over the larger cluster size of the single instance. But this is most likely lost in the wash if you have many concurrent queries.
Many queries / Noisy neighbour:
If you have hundreds of queries hitting in waves the larger single instance is worse at starting those queries, as it just has less concurrent tasks at once, and a single very large query which can flush caches, or just dominate cluster, means you stop handling normal/small queries. Where-as having the mutli cluster allow if only one "super heavy" query comes in, you only stall half your normal queries.
Other thoughts
It also really depends on your load patterns, at my last job, we had auto-scaling cluster of SMALL instances used to used to answer our read queries of dashboards, reports, and we allowed it to run a little over provisioned, so things were snappy.
Where-as our data loading ran on second auto-scaling cluster of MEDIUM instances, and which we overloaded on purpose, as we were trying to load data the fastest/cheapest. And in non-peak times we programmatically reduced the auto-scalling MAX to almost starve the load. But would do some expensive reprocessing on a LARGE instance via those saved credits in "the middle of the night" and also our loading tasks had the ability to spin up exclusive LARGE+ size warehouses to do one off rebuilds, as this was all IO bound work, and thus the smaller the window of "outage" the better the system was, and the IO scale linear, so the total cost was the same.
Which is all to say, "what is best" really depends on what you are doing, your budget, and the trade offs you are prepared for. The golden thing about snowflake it is not like a classic DB where you have to pick the size and get it right, pick one, and watch it, if it's struggling change it on the fly. We did this a number of times when a release of our code or snowflake changed the performance of some critical SQL, we would jump in, and double or triple the instance count, or size, to get past the situation, while trying to fix or work around SF issues, or awaiting SF to roll a release back. for a couple hours generally spending more credits is not budget braking. This flexibility also means you can just experiment, "what happens if we trying 4x smaller instance.." "oh nothing... look we just saved heaps of money"..
If you have min=max=2 then you permanently have 2 warehouses running (as long as they are not suspended). If you configure your multi-cluster warehouse like this then you lose a lot of the advantages but for your specific use case it might make sense, I suppose
Based on your comment, here is my answer:
In both scenarios, you will have the same resources to process your queries. The important difference would be about running single heavy queries. As you may know, a single query can not spawn to multiple clusters (yet), so when you run a query in your multi-cluster warehouse, it will be processed on one of the Large warehouses (and use max 8 nodes).
If you run the same query on your single XL warehouse, it can be executed by (max) 16 nodes. So if you will run heavy queries which requires more memory and CPU, using a single XL warehouse would be better for you.
About concurrency, there is a parameter named "MAX_CONCURRENCY_LEVEL". Its default value is 8, and it limits maximum number of concurrent executions per warehouse. If you do not change it, your single XL warehouse will execute a maximum of 8 queries concurrently, while your multi-cluster warehouse can execute 16 queries concurrently.
https://docs.snowflake.com/en/sql-reference/parameters.html#max-concurrency-level
You may increase this parameter, and provide same concurrency on both single XL and multi-cluster L warehouse. But in this case, you should be careful when you runn heavy and light queries together. Because one query may use most of the resources of the warehouse, and your light queries may have less resources and take a longer time. So I would recommend using a multi-cluster warehouse, if you will have "relatively" light/concurrent queries.

Maximum number of records for a custom object in salesforce.com

What is the maximum number of records within a single custom object in salesforce.com?
There does not seem to be a limit indicated in https://login.salesforce.com/help/doc/en/limits.htm
But of course, there has to be a limit of some kind. EG: Could 250 million records be stored in a single salesforce.com custom object?
As far as I'm aware the only limit is your data storage, you can see what you've used by going to Setup -> Administration Setup -> Data Management -> Storage Usage.
In one of the Orgs I work with I can see one object has almost 2GB of data for just under a million records, and this accounts for a little over a third of the storage available. Your storage space depends on your Salesforce Edition and number of users. See here for details.
I've seen the performance issue as well, though after about 1-2M records the performance hit appears magically to plateau, or at least it didn't appear to significantly slow down between 1M and 10M. I wonder if orgs are tier-tuned based on volume... :/
But regardless of this, there are other challenges which make it less than ideal for big data. Even though they've increased the SOQL governor limit to permit up to 50 million records to be retrieved in one call, you're still strapped with a 200,000 line execution limit in Apex and a 10K DML limit (per execution thread). These can be bypassed through Batch Apex, yet this has limitations as well. You can only execute 250K batches in 24 hours and only have 5 batches running at any given time.
So... the moral of the story seems to be that even if you managed to get a billion records into a custom object, you really can't do much with the data at that scale anyway. Therefore, it's effectively not the right tool for that job in its current state.
2-cents
LaceySnr is correct. However, there is an inverse relationship between the number of records for an object and performance. Any part of the system that filters on that object will be impacted, such as views, reports, SOQL queries, etc.
It's hard to talk specific numbers since salesforce has upwards of a dozen server clusters, each with their own performance characteristics. And there's probably a lot of dynamic performance management that occurs regularly. But, in the past I've seen performance issues start to creep in around 2M records. One possible remedy is you can ask salesforce to index fields that you plan to filter on.

Comment post scalability: Top n per user, 1 update, heavy read

Here's the situation. Multi-million user website. Each user's page has a message section. Anyone can visit a user's page, where they can leave a message or view the last 100 messages.
Messages are short pieces of txt with some extra meta-data. Every message has to be stored permanently, the only thing that must be real-time quick is the message updates and reading (people use it as chat). A count of messages will be read very often to check for changes. Periodically, it's ok to archive off the old messages (those > 100), but they must be accessible.
Currently all in one big DB table, and contention between people reading the messages lists and sending more updates is becoming an issue.
If you had to re-architect the system, what storage mechanism / caching would you use? what kind of computer science learning can be used here? (eg collections, list access etc)
Some general thoughts, not particular to any specific technology:
Partition the data by user ID. The idea is that you can uniformly divide the user space to distinct partitions of roughly the same size. You can use an appropriate hashing function to divide users across partitions. Ultimately, each partition belongs on a separate machine. However, even on different tables/databases on the same machine this will eliminate some of the contention. Partitioning limits contention, and opens the door to scaling "linearly" in the future. This helps with load distribution and scale-out too.
When picking a hashing function to partition the records, look for one that minimizes the number of records that will have to be moved should partitions be added/removed.
Like many other applications, we could assume the use of the service follows a power law curve: few of the user pages cause much of the traffic, followed by a long tail. A caching scheme can take advantage of that. The steeper the curve, the more effective caching will be. Given the short messages, if each page shows 100 messages, and each message is 100 bytes on average, you could fit about 100,000 top-pages in 1GB of RAM cache. Those cached pages could be written lazily to the database. Out of 10 Mil users, 100,000 is in the ballpark for making a difference.
Partition the web servers, possibly using the same hashing scheme. This lets you hold separate RAM caches without contention. The potential benefit is increasing the cache size as the number of users grows.
If appropriate for your environment, one approach for ensuring new messages are eventually written to the database is to place them in a persistent message queue, right after placing them in the RAM cache. The queue suffers no contention, and helps ensure messages are not lost upon machine failure.
One simple solution could be to denormalize your data, and store pre-calculated aggregates in a separate table, e.g. a MESSAGE_COUNTS table which has a column for the user ID and a column for their message count. When the main messages table is updated, then re-calculate the aggregate.
It's just shifting the bottleneck from one place to another, but it might move it somewhere that's less of a burden.

The best way to design a Reservation based table

One of my Clients has a reservation based system. Similar to air lines. Running on MS SQL 2005.
The way the previous company has designed it is to create an allocation as a set of rows.
Simple Example Being:
AllocationId | SeatNumber | IsSold
1234 | A01 | 0
1234 | A02 | 0
In the process of selling a seat the system will establish an update lock on the table.
We have a problem at the moment where the locking process is running slow and we are looking at ways to speed it up.
The table is already efficiently index, so we are looking at a hardware solution to speed up the process. The table is about 5 mil active rows and sits on a RAID 50 SAS array.
I am assuming hard disk seek time is going to be the limiting factor in speeding up update locks when you have 5mil rows and are updating 2-5 rows at a time (I could be wrong).
I've herd about people using index partition over several disk arrays, has anyone had similar experiences with trying to speed up locking? can anyone give me some advise onto a possible solution on what hardware might be able to be upgraded or what technology we can take advantage of in order to speed up the update locks (without moving to a cluster)?
One last try…
It is clear that there are too many locks hold for too long.
Once the system starts slowing down
due to too many locks there is no
point in starting more transactions.
Therefore you should benchmark the system to find out the optimal number of currant transaction, then use some queue system (or otherwise) to limit the number of currant transaction. Sql Server may have some setting (number of active connections etc) to help, otherwise you will have to write this in your application code.
Oracle is good at allowing reads to bypass writes, however SqlServer is not as standared...
Therefore I would split the stored proc to use two transactions, the first transaction should just:
be a SNAPSHOT (or READ UNCOMMITTED) transaction
find the “Id” of the rows for the seats you wish to sell.
You should then commit (or abort) this transaction,
and use a 2nd (hopefully very short) transaction that
Most likcly is READ COMMITTED, (or maybe SERIALIZABLE)
Selects each row for update (use a locking hint)
Check it has not been sold in the mean time (abort and start again if it has)
Set the “IsSold” flag on the row
(You may be able to the above in a single update statement using “in”, and then check that the expected number of rows were updated)
Sorry sometimes you do need to understant what each time of transaction does and how locking works in detail.
If the table is smaller, then the
update is shorter and the locks are
hold for less time.
Therefore consider splitting the table:
so you have a table that JUST contains “AllocationId” and “IsSold”.
This table could be stored as a single btree (index organized table on AllocationId)
As all the other indexes will be on the table that contrains the details of the seat, no indexes should be locked by the update.
I don't think you'd getting anything out of table partitioning -- the only improvement you'd get would be in fewer disk reads from a smaller (shorter) index trees (each read will hit each level of the index at least once, so the fewer levels the quicker the read.) However, I've got a table with a 4M+ row partition, indexed on 4 columns, net 10 byte key length. It fits in three index levels, with the topmost level 42.6% full. Assuming you had something similar, it seems reasonable that partitioning might only remove one level from the tree, and I doubt that's much of an improvement.
Some off the-cuff hardward ideas:
Raid 5 (and 50) can be slower on writes, because of the parity calculation. Not an issue (or so I'm told) if the disk I/O cache is large enough to handle the workload, but if that's flooded you might want to look at raid 10.
Partition the table across multiple drive arrays. Take two (or more) Raid arrays, distribute the table across the volumes[files/file groups, with or without table partitioning or partitioned views], and you've got twice the disk I/O speed, depending on where the data lies relative to the queries retrieving it. (If everythings on array #1 and array #2 is idle, you've gained nothing.)
Worst case, there's probably leading edge or bleeding edge technology out there that will blow your socks off. If it's critical to your business and you've got the budget, might be worth some serious research.
How long is the update lock hold for?
Why is the lock on the “table” not just the “rows” being sold?
If the lock is hold for more then a
faction of a second that is likely to
be your problem. SqlServer does not
like you holding locks while users
fill in web forms etc.
With SqlServer, you have to implement a “shopping cart” yourself, by temporary reserving the seat until the user pays for it. E.g add a “IsReserved” and “ReservedAt” colunn, then any seats that has been reserved for more then n minutes should be automatically unreserved.
This is a hard problem, as a shopper does not expect a seat that is in stock to be sold to someone else where he is checking out. However you don’t know if the shopper will ever complete the checkout. So how do you show it on a UI. Think about having a look at what other booking websites do then copy one that your users already know how to use.
(Oracle can sometimes cope with lock being kept for a long time, but even Oracle is a lot faster and happier if you keep your locking short.)
I would first try to figure out why the you are locking the table rather than just a row.
One thing to check out is the Execution plan of the Update statement to see what Indexes it causes to be updated and then make sure that row_level_lock and page_level_lock are enabled on those indexes.
You can do so with the following statement.
Select allow_row_locks, allow_page_locks from sys.indexes where name = 'IndexNameHere'
Here are a few ideas:
Make sure your data and logs are on separate spindles, to maximize write performance.
Configure your drives to only use the first 30% or so for data, and have the remainder be for backups (minimize seek / random access times).
Use RAID 10 for the log volume; add more spindles as needed for performance (write performance is driven by the speed of the log)
Make sure your server has enough RAM. Ideally, everything needed for a transaction should be in memory before the transaction starts, to minimize lock times (consider pre-caching). There are a bunch of performance counters you can check for this.
Partitioning may help, but it depends a lot on the details of your app and data...
I'm assuming that the T-SQL, indexes, transaction size, etc, have already been optimized.
In case it helps, I talk about this subject in detail in my book (including SSDs, disk array optimization, etc) -- Ultra-Fast ASP.NET.

Resources