I am rebuilding some indexes in Azure SQL using a fill factor of 80 (recommended by the company who developed the application, who are not experts on the database) and after doing this queries got a LOT slower. We noticed that now they were taking a longer time in "Network I/O". Does anybody know what the problem might be?
Fillfactor is not a silver bullet and has it's tradeoffs. https://www.mssqltips.com/sqlservertip/5908/what-is-the-best-value-for-fill-factor-in-sql-server/
It is important to note which effect the lower fillfactor value has on the underlying data pages and index pages, which comprise your table:
There is now 20% more storage allocated for data pages for the same number of records!
This causes increased I/O. Depending on your Azure storage/compute plan you may be hitting a ceiling and need to bump up you IOPS.
Now, if you are not running out of IOPS, there's more to look into. Is it possible that the index rebuild operation had not completed yet and index is not being used for query optimization? A Profiler/Execution plan can confirm this.
I'd say that if you have a very large table and want to speed things up dramatically, your best bet is partitioning on the column most commonly used to address the data.
See also: https://www.sqlshack.com/database-table-partitioning-sql-server/
Try to identify queries returning large data sets to client hosts. Large result sets may lead to unnecessary network utilization and client application processing. Make sure queries are returning only what is needed using filtering and aggregations, and make sure no duplicates are being returned unnecesarily.
Another possible cause of that wait on Azure SQL may be the client application doesn't fetch results fast enough and doesn't notify Azure SQL that the result set has been received. On the client appliation side, store the results in memory first and only then doing more processing. Make sure the lient application is not under stress and that makes it unable to get the results faster.
One last thing, make sure Azure SQL and the appliation are on the same region, and there is not transfer of data over different regions or zones.
Related
My team and I have been using Snowflake daily for the past eight months to transform/enrich our data (with DBT) and make it available in other tools.
While the platform seems great for heavy/long running queries on large datasets and powering analytics tools such as Metabase and Mode, it just doesnt seem to behave well in cases where we need to run really small queries (grab me one line of table A) behind a high demand API, what I mean by that is that SF sometimes takes as much as 100ms or even 300ms on a XLARGE-2XLARGE warehouse to fetch one row in a fairly small table (200k computed records/aggregates), that added up to the network latency makes for a very poor setup when we want to use it as a backend to power a high demand analytics API.
We've tested multiple setups with Nodejs + Fastify, as well as Python + Fastapi, with connection pooling (10-20-50-100)/without connection pooling (one connection per request, not ideal at all), deployed in same AWS region as our SF deployment, yet we werent able to sustain something close to 50-100 Requests/sec with 1s latency (acceptable), but rather we were only able to get 10-20 Requests/sec with as high as 15-30s latency. Both languages/frameworks behave well on their own, or even with just acquiring/releasing connections, what actually takes the longest and demands a lot of IO is the actual running of queries and waiting for a response. We've yet to try a Golang setup, but it all seems to boil down to how quick Snowflake can return results for such queries.
We'd really like to use Snowflake as database to power a read-only REST API that is expected to have something like 300 requests/second, while trying to have response times in the neighborhood 1s. (But are also ready to accept that it was just not meant for that)
Is anyone using Snowflake in a similar setup? What is the best tool/config to get the most out of Snowflake in such conditions? Should we spin up many servers and hope that we'll get to a decent request rate? Or should we just copy transformed data over to something like Postgres to be able to have better response times?
I don't claim to be the authoritative answer on this, so people can feel free to correct me, but:
At the end of the day, you're trying to use Snowflake for something it's not optimized for. First, I'm going to run SELECT 1; to demonstrate the lower-bound of latency you can ever expect to receive. The result takes 40ms to return. Looking at the breakdown that is 21ms for the query compiler and 19ms to execute it. The compiler is designed to come up with really smart ways to process huge complex queries; not to compile small simple queries quickly.
After it has its query plan it must find worker node(s) to execute it on. A virtual warehouse is a collection of worker nodes (servers/cloud VMs), with each VW size being a function of how many worker nodes it has, not necessarily the VM size of each worker (e.g. EC2 instance size). So now the compiled query gets sent off to a different machine to be run where a worker process is spun up. Similar to the query planner, the worker process is not likely optimized to run small queries quickly, so the spin-up and tear-down of that process might be involved (at least relative to say a PostgreSQL worker process).
Putting my SELECT 1; example aside in favor of a "real" query, let's talk caching. First, Snowflake does not buffer tables in memory the same way a typical RDBS does. RAM is reserved for computation resources. This makes sense since in traditional usage you're dealing with tables many GBs to TBs in size, so there would be no point since a typical LRU cache would purge that data before it was ever accessed again anyways. This means that a trip to an SSD disk must occur. This is where your performance will start to depend on how homogeneous/heterogeneous your API queries are. If you're lucky you get a cache hit on SSD, otherwise its off to S3 to get your tables. Table files are not redundantly cached across all worker nodes, so while the query planner will make an attempt to schedule a computation on a node most likely to have the needed files in cache, there is no guarantee that a subsequent query will benefit from the cache resulting from the first query if it is assigned to a different worker node. The likeliness of this happening increases if you're firing 100s of queries at the VM/second.
Lastly, and this could be the bulk of your problem but have saved it for last since I am the least certain on it. A small query can run on a subset of the workers in a virtual warehouse. In this case the VH can run concurrent queries with different queries on different nodes. BUT, I am not sure if a given worker node can process more than one query at once. In that case, your concurrency will be limited by the number of nodes in the VH, e.g. a VH with 10 worker nodes can at most run 10 queries in parallel, and what you're seeing are queries piling up at the query planner stage while it waits for worker nodes to free up.
maybe for this type of workload , the new SF feature Search Optimization Service could help you speeding up performances ( https://docs.snowflake.com/en/user-guide/search-optimization-service.html ).
I have to agree with #Danny C - that Snowflake is NOT designed for very low (sub-second) latency on single queries.
To demonstrate this consider the following SQL statements (which you can execute yourself):
create or replace table customer as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 500000;
-- Execution time 840ms
create or replace table customer_ten as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 10;
-- Execution time 431ms
I just ran this on an XSMALL warehouse and it demonstrates currently (November 2022) Snowflake can copy a HALF MILLION ROWS in 840 milliseconds - but takes 431 ms to copy just 10 rows.
Why is Snowflake so slow compared (for example) to Oracle 11g on premises:
Well - here's what Snowflake has do complete:
Compile the query and produce an efficient execution plan (plans are not currently cached as they often lead to a sub-optimal plan being executed on data which has significantly increased in volume)
Resume a virtual warehouse (if suspended)
Execute the query and write results to cloud storage
Synchronously replicate the data to two other data centres (typically a few miles apart)
Return OK to the user
Oracle on the other hands needs to:
Compile the query (if the query plan is not already cached)
Execute the query
Write results to local disk
If you REALLY want sub-second query performance on SELECT, INSERT, UPDATE and DELETE on Snowflake - it's coming soon. Just check out Snowflake Unistore and Hybrid Tables Explained
Hope this helps.
We are in the process of migrating our SAAS system from Oracle to Azure SQL. We anticipate that we will have less available buffer cache in Azure SQL than we have in our Oracle server (even with sliders maxed out in Azure). Our application is quite dependent on the buffer cache for performance. Selects causing table scans being run frequently will cause a lot of data to be stored in the buffer cache, and the complete page will be read. We are not able to remove these table scans completely. So, to try to reduce the need for buffer cache in Azure SQL, we are considering to store nvarchar(max) values out-of-row with the command
EXEC sp_tableoption 'MYTABLE.NVARCHARMAX_COL', 'large value types out of row', 1;
The idea is that this will cause less data to scan as the column values are not stored in the page and hence less data to store in the buffer cache.
Is this a correct assumption?
nvarchar(#) will cause a memory grant of #/2 for queries, as far as we understand.
What is the memory grant for nvarchar(max)?
Will memory grants compete with buffer cache memory, or will buffer cache memory have priority over memory grants?
Is there anything to gain using nvarchar(max) with out-of-row storage instead of nvarchar(#) when it comes to buffer cache usage? Or to the contrary?
We are aware of other drawbacks with nvarchar(max) like slower reads and writes (50%-100%?) and that it cannot be indexed.
Some more details:
Our application is an Applicant Tracking System. It may be so that we will be ok with the default settings for nvarchar(max), but we want to be as sure as possible that Azure SQL will handle the full operational load before the launch. We are thinking that the increased IO with out-of-row storage is small compared to the anticipated benefit of savings in query time. We are currently running stress tests to try to figure this out. A good example is the job application table, where each row in the table consists of very little data other than the cover letter, and the table is included in a lot of queries where the cover letter column is neither a part of the where condition nor in the select column list. In Oracle, this table is currently 90% cached using 47GB of the buffer cache.
We would like to inspect the buffer cache in Azure SQL in more detail, that is to see which tables are buffered and to what extent, does Azure SQL allow this?
While Oracle and SQL are both database engines, they have different personalities in terms of how they behave and how you tune an application to get the most on each of them. I would not start with the kind of tuning choice you are proposing on SQL - generally having small LOBs in row is a good thing and people rarely ever need to mess with that setting. However, there are other things that are usually more important to whether you are happy in SQL Azure having moved from another DB vendor's product. (Indexing choices is a common one where SQL Server/SQL Azure may benefit from different kinds of indexes than you are used to in another vendor's product offerings. SQL often likes covering indexes more than some other products when choosing plans in OLTP apps).
Memory management across components in SQL is dynamic based on the needs of each component. You don't need to tweak or worry about it much. You may need to consider some tweaks to your app or to the kinds of query plans you prefer to balance the resource usage (say, CPU vs. memory vs. IO) and try to best use the HW available in the cloud environment as it may not align with your on-premises server. In some situations, we have done work to get an app to use less tempdb or to index a bit more to reduce hash joins (and thus memory requirements at runtime). Memory grants are really a function of a lot of things - not just the size of an nvarchar's max length (LOBS are more expensive than in-row by a fair amount, as you surmise).
Depending on the kind of app you have, you may benefit from things like columnstores (which are highly compressed) which may change the kinds of resources on which your app is contending.
It usually helps to post some details of the kind of app you want to move. Is it an OLTP system? If so, is it sharded or not? Is it an analytical system? You don't have to publish lots of internal details of your app, but enough to help people give more than generic advice will help.
The short answer is that you should only change that setting when the rest of your app is tuned and you have a very good reason to do so. I haven't seen anyone touch it for years and years in engagements I've had (and I do many of them).
If you have an relationship with the Microsoft field, then I would also encourage you to reach out to them - there are often ways to get direct help/advice on how to migrate an ISV/SaaS app to SQL/SQL Azure that may be available to you.
Best of luck on your app migration!
We have done some performance testing comparing nvarchar(max) in-row with out-of-row using column values between 3000 and 3800 characters. The query times in table scans are down by 60% in favour of out-of-row. The buffercache usage is also significantly less. Read penalty about 10% compared to in-row.
To specify Out-of-row:
EXEC sp_tableoption 'MY_TABLE', 'large value types out of row', 1;
Queries to inspect the buffercache in detail:
https://www.sqlshack.com/insight-into-the-sql-server-buffer-cache/
My app currently connects to a RDS Multi-AZ database. I also have a Single-AZ Read Replica used to serve my analytics portal.
Recently there have been an increasing load on my master database, and I am thinking of how to resolve this situation without having to scale up my database again. The two ways I have in mind are
Move all the read queries from my app to the read-replica, and just scale up the read-replica, if necessary.
Implement ElastiCache Memcached.
To me these two options seem to achieve the same outcome for me - which is to reduce load on my master database, but I am thinking I may have understood some fundamentals wrongly because Google doesnt seem to return any results on a comparison between them.
In terms of load, they have the same goal, but they differ in other areas:
Up-to-dateness of data:
A read replica will continuously sync from the master. So your results will probably lag 0 - 3s (depending on the load) behind the master.
A cache takes the query result at a specific point in time and stores it for a certain amount of time. The longer your queries are being cached, the more lag you'll have; but your master database will experience less load. It's a trade-off you'll need to choose wisely depending on your application.
Performance / query features:
A cache can only return results for queries it has already seen. So if you run the same queries over and over again, it's a good match. Note that queries must not contain changing parts like NOW(), but must be equal in terms of the actual data to be fetched.
If you have many different, frequently changing, or dynamic (NOW(),...) queries, a read replica will be a better match.
ElastiCache should be much faster, since it's returning values directly from RAM. However, this also limits the number of results you can store.
So you'll first need to evaluate how outdated your data can be and how cacheable your queries are. If you're using ElastiCache, you might be able to cache more than queries — like caching whole sections of a website instead of the underlying queries only, which should improve the overall load of your application.
PS: Have you tuned your indexes? If your main problems are writes that won't help. But if you are fighting reads, indexes are the #1 thing to check and they do make a huge difference.
I have an 2 tables from 12 tables and these 2 tables having millions of records , and when I retrieve the data from these tables it takes much more time . I have heard about indexing , but I think indexing is not a right approach which can be used here . Because each time , I need to fetch whole record instead of 2-3 columns of a record. I also applied indexing , but it took more execution time than without indexing because I fetched whole record.
So , what should be the right approach can be used here?
I'm basing my arguments on Oracle, but similar principles probably apply to other RDBMSs. Please tag your question with the system used.
For indexing the number of columns is mostly irrelevant. More important is the number of rows. But I guess you need all or mostly all of those as well. Indexing won't help in this case, since it would just add another step in the process without reducing the amount of work getting done.
So what you seem to do are large table scans. Those are normally not cached, because they would basically flush the whole cache from all the other useful data being stored there. So every time you select this kind of data you have to scratch in from disc, probably sending it over a wire also. This is bound to take some time.
From what you describe probably the best approach is to cut down on disc reads and network traffic by caching the data as near as possible to the application as possible. Try to setup a cache on the machine of your application possibly as part of your application. Read the data once, put it in the cache and read it from their afterwards. An in memory database would allow you to keep your SQL based access path if this is of any value for you.
Possibly try to fill the cache in the background before anybody is trying to use it.
Of course this will eat up quite some memory and you have to judge if this is feasible.
Second approach would be to tune the caching settings to make the database cache those tables in memory. But be warned that this will affect the performance of the database as a whole and not in a positive way.
Third option might be to move your processing logic into the database. It won't reduce the amount of disc I/O, but at least you would take the network out of the loop (assuming this is part of the issue)
There are few ways you can try things out :-
enable/increase query cache size of the database.
memcached at application level will increase your performance (for sure).
tweak your queries to get the best performance, and configure the best working indexes.
Hope it helps. I had tested all three for MySQL database - django(python) applicaton and they show good results.
I want to replicate data from a boat offshore to an onshore site. The connection is sometimes via a satellite link and can be slow and have a high latency.
Latency in our application is important, the people on-shore should have the data as soon as possible.
There is one table being replicated, consisting of an id, datetime and some binary data that may vary in length, usually < 50 bytes.
An application off-shore pushes data (hardware measurements) into the table constantly and we want these data on-shore as fast as possible.
Are there any tricks in MS SQL Server 2008 that can help to decrease the bandwith usage and decrease the latency? Initial testing uses a bandwidth of 100 kB/s.
Our alternative is to roll our own data transfer and initial prototyping here uses a bandwidth of 10 kB/s (while transferring the same data in the same timespan). This is without any reliability and integrity checks so this number is artificially low.
You can try out different replication profiles or create your own. Different profiles are optimized for different network/bandwidth scenarios.
MSDN talks about replication profiles here.
Have you considered getting a WAN accelerator appliance? I'm too new here to post a link, but there are several available.
Essentially, the appliance on the sending end compresses the outgoing data, and the receiving end decompresses it, all on the fly and completely invisibly. This has the benefit of increasing the apparent speed of the traffic and not requiring you to change your server configurations. It should be entirely transparent.
I'd suggest on the fly compression/decompression outside of SQL Server. That is, SQL replicates the data normally but something in the network stack compresses so it's much smaller and bandwidth efficient.
I don't know of anything but I'm sure these exist.
Don't mess around with the SQL files directly. That's madness if not impossible.
Do you expect it to always be only one table that is replicated? Are there many updates, or just inserts? The replication is implemented by calling an insert/update sproc on the destination for each changed row. One cheap optimization is to force the sproc name to be small. By default it is composed from the table name, but IIRC you can force a different sproc name for the article. Given an insert of around 58 bytes for a row, saving 5 or 10 characters in the sproc name is significant.
I would guess that if you update the binary field it is typically a whole replacement? If that is incorrect and you might change a small portion, you could roll your own diff patching mechanism. Perhaps a second table that contains a time series of byte changes to the originals. Sounds like a pain, but could have huge savings of bandwidth changes depending on your workload.
Are the inserts generally done in logical batches? If so, you could store a batch of inserts as one customized blob in a replicated table, and have a secondary process that unpacks them into the final table you want to work with. This would reduce the overhead of these small rows flowing through replication.