We store some documents in a SQL Server database VarBinary(Max) column. Most documents will be a few KB, but sometimes it maybe a couple of MB.
We run into an issue when the file becomes bigger than about 4MB.
When updating the VarBinary column in a on-prem SQL Server, it is very fast (0.6 seconds for a 8MB file).
When doing the same statement on a identical database on SQL Azure, it takes more than 15 seconds!
Also if the code is running from an Azure App Service it is very slow. So it's not our Internet connection that is the problem.
I know storing files in SQL Server is not the preferred way of storing and Blob storage would normally the best solution, but we have special reasons we need to do this, so I want to leave that out of the discussion ;-)
When investigating the execution plans, I see a "Table Spool" taking all the time and I'm not sure why. Below are the execution plans for on prem and Azure.
Identical databases and data. If someone can help, that would be great.
Thanks Chris
The table spool operator is caching the row to be updated (In tempdb) and then feeds it to the Table Update operator, Spool operators are a sign that the database engine is performing a large number of writes (8 KB pages) to TempDB.
For I/O intensive workloads you need to scale to Premium tiers. On Basic and Standard tiers those updates won’t have a good performance.
Hope this helps.
Related
We have a multi-tenant environment with several hundred databases that are mostly identical schema-wise but I'm worried that a query plan may be the fastest for one database but not another. For example, if we have one database that doesn't have a lot of data and you run a query that is deemed to be fast enough to do a scan across all rows and saves that plan but then if you run the same query against a much large database will it generate/save it's own plan or use the one created against the much smaller database.
Yes!
When you have multiple instance of a database, they have the same schema, BUT each database different number records and statistics and ....
So it makes sense, SQL Server keeps execution plans per database.
Note: "storing" doesn't mean SQL Server, writes the query plans in database. They are stored in cache while SQL Server service is running and it has enough memory and the plan still need to be kept in cache for later use.
We use a SQL Server 2008 Web Edition on a Windows 2012 R2 server (32 GB RAM) to store data for an ASP.NET based web application. There are several dabases with news tables and different views which we query regularly (SqlDataReader, Linq-to-SQL) with different joins and filter conditions. The queries itself are longer and domain-specific so I skip an example.
So far everything worked fine.
Now we had to change such a query and extend it with a simple OR condition.
The result was that the number of reads and writes in the TempDB increased dramatically. Dramatically means 1000 writes of more than 100 MB per minute which results in a total tempdb file size of currently 1.5 GB.
If we remove the OR filter statement from the original query the TempDB file I/O normalizes instantly.
However, we do not have a clue what's going on within the TempDB. We ran the Query Analyzer several times and compared the results but its index optimization recommendations were only related to other databases stats and did not have any effect.
How would you narrow down this issue? Does anyone else experienced such a behavior in the past? Is it likely to be a problem with the news query itself or is it possible that we simply have to change some TempDB database properties to improve its I/O performance, e.g. autogrowth?
Start by analyzing your execution plans and run your queries with statistics (use the profiler). The problem is not in de tempdb, but in your queries. Then you will see where you select to many row which are temporary saved in de tempdb. Then you can change the queries or add the index you are missing.
We have a Delphi application which can connect to either Oracle or SQL Server. We use Devart components to connect to the databases, and everything is very generic when it comes to database access. i.e. we use the lowest common denominator. Ultimately we use the databases as data stores and do not use any of the more "advanced" features which maybe specific to the database.
However we have a serious performance issue with Oracle. It is to do with inserting data. I know that inserting data by running off a load of insert statements is not great for performance, but due to some business logic that needs to be done on the raw data before it gets uploaded to the database, we are a little restricted to multiple inserts. To get an idea of performance differences, a recent test we did, inserts 1000 items into our database and takes 5 minutes in SQL Server (acceptable) but 44 minutes in Oracle.
Is there anything we could do to improve performance? The inserting of data needs to be done by the user and NOT an Oracle DBA, so absolutely no Oracle skills is one of the pre-requisites for any solution. Basically, the users need to press a button and everything is done.
Edit: Business Logic happens before the insert (although there is a little going on during the actual insert, so more realistic number would be 2 minutes for SQL Server and 40 or so minutes for Oracle. Bear in mind we are inserting a few large blobs per record, so perhaps that explains the slowish performance, but not why there is such a difference. The 1000 items are part of a transaction.
Oracle supports array DML, which can speed up performance. Also if BLOB are involved, performance may depend on caching settings, and how the BLOB are setup in the destination table. Some db client parameters tuning may be also beneficial to increase network speed.
Anyway, without knowing which version of Oracle you're using, how it is configured, your table(s) deinition (and its tablespaces), how large are the BLOBS, and the SQL actually used (did you trace it?), it's very difficult to diagnose the real problem.
Oracle has some powerful diagnostic tools to identify bottlenecks, but they may not be easy to use and require to know enough about how Oracle works. From the Enterprise Manager Console you can access some of them in a more readable format - did you check it?
Update: because I can't comment to other answers, Oracle support differet type of LOB storage:
LOBs stored into the database (under transaction managment)
BFILES, external file system LOBs yet still managed by Oracle (LOB data not under transaction)
SecureFiles (11g onwards, alike BFILES but with transaction support and other features)
Oracle is designed for and can manage large LOBs - just it needs to be configured properly. Parameter that will affect LOB performance:
ENABLE/DISABLE STORAGE IN ROW
CACHE/NOCACHE/CACHE READS
LOGGING/NOLOGGING
CHUNK
PCTVERSION/RETENTION (especially for updates and deletes)
TABLESPACE (usually, a dedicated tablespace for lobs is advisable)
These parameters needs to be set taking into account the average LOB size, how LOBs are accessed, amd how often are modified. There's no "one size fits all".
But there are also the client side: OCI can buffer LOBs client side, so small read/write operations are cached, minimizing the number of network roundtrips and LOB versioning - that's up to the OCI wrapper you're using.
Array DML (only available with FireDac, ODAC, DOA and our SynDbOracle unit afaik) won't change much if your problem is about blob transfer.
First idea is to compress the data before transmission.
Try several access libraries. Our open source SynDBOracle directly accesses the oci.dll client but may be slightly faster.
But perhaps the problem may be on the server side. Oracle does not like transactions with huge data, since it tends to overflow its wal files. Try to tune the write ahead log files of the table.
IMHO a rdbms is not the best option to store huge blobs. Plain files, indexed via a rdbms for metadata is usually better. Or switch to a big SQL storage, like key/value stores or mongodb blob api.
Remember that both Oracle and mssql do ask money proportional to the data size....
What is the fastest way to backup/restore Azure SQL database?
The background: We have the database with size ~40 GB and restoring it from the .bacbac file (~4GB of compressed data) in the native way by Azure SQL Database Import/Export Service takes up to 6-8 hours. Creating .bacpac is also very long and takes ~2 hours.
UPD:
UPD.
Creating the database (by the way transactional consistent) copy using CREATE DATABASE [DBBackup] AS COPY OF [DB] takes only 15 minutes with 40 GB database and the restore is simple database rename.
UPD. Dec, 2014. Let me share with you our experience about the fastest way of DB migration schema we ended up with.
First of all, the approach with data-tier application (.bacpac) turned out to be not viable for us after DB became slightly bigger and it also will not work for you if you have at least one non-clustered index with total size > 2 GB until you disable non-clustered indexes before export - it's due to Azure SQL transaction log limit.
We stick to Azure Migration Wizard that for data transfer just runs BCP for each table (parameters of BCP are configurable) and it's ~20% faster than approach with .bacpac.
Here are some pitfalls we encountered with the Migration Wizard:
We run into encoding troubles for non-Unicode strings. Make sure
that BCP import and export runs with same collation. It's -C ... configuration switch, you can find parameters with which BCP calling
in .config file for MW application.
Take into account that MW (at least the version that is actual at the moment of this writing) runs BCP with parameters that will leave the constraints in non-trusted state, so do not forget to check all non-trusted constraints after BCP import.
If your database is 40GB it's long past time to consider having a redundant Database server that's ready to go as soon as the main becomes faulty.
You should have a second server running alongside the main DB server that has no actual routines except to sync with the main server on an hourly/daily basis (depending on how often your data changes, and how long it takes to run this process). You can also consider creating backups from this database server, instead of the main one.
If your main DB server goes down - for whatever reason - you can change the host address in your application to the backup database, and spend the 8 hours debugging your other server, instead of twiddling your thumbs waiting for the Azure Portal to do its thing while your clients complain.
Your database shouldn't be taking 6-8 hours to restore from backup though. If you are including upload/download time in that estimate, then you should consider storing your data in the Azure datacenter, as well as locally.
For more info see this article on Business Continuity on MSDN:
http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx
You'll want to specifically look at the Database Copies section, but the article is worth reading in full if your DB is so large.
Azure now supports Point in time restore / Geo restore and GeoDR features. You can use the combination of these to have quick backup / restore. PiTR and Geo restore comes with no additional cost while you have to pay for
Geo replica
There are multiple ways to do backup, restore and copy jobs on Azure.
Point in time restore.
Azure Service takes full backups, multiple differential backups and t-log backups every 5 minutes.
Geo Restore
same as Point in time restore. Only difference is that it picks up a redundant copy from a different blob storage stored in a different region.
Geo-Replication
Same as SQL Availability Groups. 4 Replicas Async with read capabilities. Select a region to become a hot standby.
More on Microsoft Site here. Blog here.
Azure SQL Database already has these local replicas that Liam is referring to. You can find more details on these three local replicas here http://social.technet.microsoft.com/wiki/contents/articles/1695.inside-windows-azure-sql-database.aspx#High_Availability_with_SQL_Azure
Also, SQL Database recently introduced new service tiers that include new point-in-time-restore. Full details at http://msdn.microsoft.com/en-us/library/azure/hh852669.aspx
Key is to use right data management strategy as well that helps solve your objective. Wrong architecture and approach to put everything on cloud can prove disastrous... here's more to it to read - http://archdipesh.blogspot.com/2014/03/windows-azure-data-strategies-and.html
In Oracle 8 doing an online backup with BLOBs in the database is extremely slow. By slow, I mean over an hour to backup a database with 100MB of BLOB data. Oracle acknowledged it was slow, but wouldn't fix the problem (so much for paying for support.) Does anyone know if Oracle has fixed this problem with subsequent releases? Also, how fast do online backups work with BLOBs work in SQL Server and MySQL?
I've had this issue in the past, and the only decent workarounds we found were to make sure that the LOBs were in their own tablespace, and use a different backup strategy with them, or to switch to using the BFILE type. Whether or not you can get by with BFILE will depend on how you're using the LOBs.
Some usage info on BFILE:
http://download-uk.oracle.com/docs/cd/B10501_01/java.920/a96654/oralob.htm#1059942
Note that BFILEs live on the filesystem outside of Oracle, so you'd need to back them up in a process outside of your normal Oracle backup. On one project we just had a scheduled rsync to offsite backup. Also important to note is that you cannot create/update BFILEs via JDBC, but you can read them.
To answer your question about the speed of online backups of BLOBs in SQL Server, it's the same speed as backing up regular data for SQL 2000/2005/2008 - it's typically limited by the speed of your storage. I usually get over 100mb/sec on my database backups with BLOBs.
Be wary of using backup compression tools with those, though - if the BLOB is binary-style data that's heavily random, then you'll waste CPU cycles trying to compress the data, and compression can make the backup slower instead of faster.
I use SQL Backup from Redgate for SQL Server -- it is ridiculously fast, even with my BLOB data.
I keep a copy of every file that I do EDI with, so while they aren't huge, they are numerous and BLOBs. I'm well over 100Megs of just these text files.
It's important to note that Redgate's SQL Backup is just a front-end to the standard SQL Backup...it gives you additional management features, basically, but still utilizes the SQL Server backup engine.
Depending on the size of the BLOBs, make sure you're storing them in-line / out of line appropriately.
See http://www.dba-oracle.com/t_table_blob_lob_storage.htm
Can you put the export file you're creating and the Oracle tablespaces on different disks? You I/O throughput may be the constraining factor...?
exp on 8i was slow, but not as much as you describe. I have backed-up gigabytes of blobs in minutes in 10g..(to disk - using expdp)