Speed of online backups with BLOBs - database

In Oracle 8 doing an online backup with BLOBs in the database is extremely slow. By slow, I mean over an hour to backup a database with 100MB of BLOB data. Oracle acknowledged it was slow, but wouldn't fix the problem (so much for paying for support.) Does anyone know if Oracle has fixed this problem with subsequent releases? Also, how fast do online backups work with BLOBs work in SQL Server and MySQL?

I've had this issue in the past, and the only decent workarounds we found were to make sure that the LOBs were in their own tablespace, and use a different backup strategy with them, or to switch to using the BFILE type. Whether or not you can get by with BFILE will depend on how you're using the LOBs.
Some usage info on BFILE:
http://download-uk.oracle.com/docs/cd/B10501_01/java.920/a96654/oralob.htm#1059942
Note that BFILEs live on the filesystem outside of Oracle, so you'd need to back them up in a process outside of your normal Oracle backup. On one project we just had a scheduled rsync to offsite backup. Also important to note is that you cannot create/update BFILEs via JDBC, but you can read them.

To answer your question about the speed of online backups of BLOBs in SQL Server, it's the same speed as backing up regular data for SQL 2000/2005/2008 - it's typically limited by the speed of your storage. I usually get over 100mb/sec on my database backups with BLOBs.
Be wary of using backup compression tools with those, though - if the BLOB is binary-style data that's heavily random, then you'll waste CPU cycles trying to compress the data, and compression can make the backup slower instead of faster.

I use SQL Backup from Redgate for SQL Server -- it is ridiculously fast, even with my BLOB data.
I keep a copy of every file that I do EDI with, so while they aren't huge, they are numerous and BLOBs. I'm well over 100Megs of just these text files.
It's important to note that Redgate's SQL Backup is just a front-end to the standard SQL Backup...it gives you additional management features, basically, but still utilizes the SQL Server backup engine.

Depending on the size of the BLOBs, make sure you're storing them in-line / out of line appropriately.
See http://www.dba-oracle.com/t_table_blob_lob_storage.htm

Can you put the export file you're creating and the Oracle tablespaces on different disks? You I/O throughput may be the constraining factor...?

exp on 8i was slow, but not as much as you describe. I have backed-up gigabytes of blobs in minutes in 10g..(to disk - using expdp)

Related

How can I link a SQL image column to an external database?

I need a bit of a help with the following.
Note: in the following scenario, I do not have access to the application's source code, therefore I can only make changes at the database level.
Our database uses dbo.[BLOB] to store all kinds of files and documents. The table uses an IMAGE (yeah, obsolete) data type. Since this particular table is growing quite fast, I was thinking to implement some archiving feature.
My idea is to move all files older than X months to a second database, and then somehow link from the dbo.[BLOB] table to the external/archiving database.
Is this even possible? The goal is to reduce the database size, in order to improve backup and query performance.
Any ideas and hints much appreciated.
Thanks.
Fabian
There are 2 features to help you with backup speed and database size in this case:
Filestream will allow you to store BLOBS as files on the file system instead of in database file. It complicates backup scenario, you have to backup both database and files but you get smaller database file along with faster access time to documents. It is much faster to read file from filesystem than from blob column. Additionally filestream allows for files bigger than 2GB.
Partitioning will split table into smaller chunks on physical level. This way you do not need to access application code to change where particular rows are stored physically and decide which data needs to be accessed fast and put it on SSD drive and which can land on slower archive. This way you can have more frequent backups on current partition, while less frequent on archive.
Prior to SQL Server 2016 SP1 - this feature was available in Enterprise version only. For SQL Server 2016 SP1 this is available in all editions.
In your case most likely you should go with filestream first.
W/o modifying the application you can do, basically, nothing. You may try to see if changing the column type will be tolerated by the application (very unlikely, 99.99% it will break the app) and try to use FILESTREAM, but even if you succeed it does not give much benefits (backup size will be the same, for example).
A second thing you can try is to replace the table with a view, using INSTEAD OF triggers for updates. It is still very likely to break the application (lets say 99.98%). The goal would be to have a distributed partitioned view (or cross DB partitioned view) which presents to the application an unified view of the 'cold' and 'hot' data. Is complex, error prone, but it will reduce the size of the backups (as long as data is moved from hot to cold and cold data is immutable, requiring few backups).
The goal is to reduce the database size, in order to improve backup and query performance.
To reduce the backup size, as I explained above, you can do, basically, nothing. But performance you need to investigate it and address it appropriately, based on your findings. Saying the the database is slow 'because of BLOBs' is hand-waving.

Oracle performance when doing bulk inserts from Delphi

We have a Delphi application which can connect to either Oracle or SQL Server. We use Devart components to connect to the databases, and everything is very generic when it comes to database access. i.e. we use the lowest common denominator. Ultimately we use the databases as data stores and do not use any of the more "advanced" features which maybe specific to the database.
However we have a serious performance issue with Oracle. It is to do with inserting data. I know that inserting data by running off a load of insert statements is not great for performance, but due to some business logic that needs to be done on the raw data before it gets uploaded to the database, we are a little restricted to multiple inserts. To get an idea of performance differences, a recent test we did, inserts 1000 items into our database and takes 5 minutes in SQL Server (acceptable) but 44 minutes in Oracle.
Is there anything we could do to improve performance? The inserting of data needs to be done by the user and NOT an Oracle DBA, so absolutely no Oracle skills is one of the pre-requisites for any solution. Basically, the users need to press a button and everything is done.
Edit: Business Logic happens before the insert (although there is a little going on during the actual insert, so more realistic number would be 2 minutes for SQL Server and 40 or so minutes for Oracle. Bear in mind we are inserting a few large blobs per record, so perhaps that explains the slowish performance, but not why there is such a difference. The 1000 items are part of a transaction.
Oracle supports array DML, which can speed up performance. Also if BLOB are involved, performance may depend on caching settings, and how the BLOB are setup in the destination table. Some db client parameters tuning may be also beneficial to increase network speed.
Anyway, without knowing which version of Oracle you're using, how it is configured, your table(s) deinition (and its tablespaces), how large are the BLOBS, and the SQL actually used (did you trace it?), it's very difficult to diagnose the real problem.
Oracle has some powerful diagnostic tools to identify bottlenecks, but they may not be easy to use and require to know enough about how Oracle works. From the Enterprise Manager Console you can access some of them in a more readable format - did you check it?
Update: because I can't comment to other answers, Oracle support differet type of LOB storage:
LOBs stored into the database (under transaction managment)
BFILES, external file system LOBs yet still managed by Oracle (LOB data not under transaction)
SecureFiles (11g onwards, alike BFILES but with transaction support and other features)
Oracle is designed for and can manage large LOBs - just it needs to be configured properly. Parameter that will affect LOB performance:
ENABLE/DISABLE STORAGE IN ROW
CACHE/NOCACHE/CACHE READS
LOGGING/NOLOGGING
CHUNK
PCTVERSION/RETENTION (especially for updates and deletes)
TABLESPACE (usually, a dedicated tablespace for lobs is advisable)
These parameters needs to be set taking into account the average LOB size, how LOBs are accessed, amd how often are modified. There's no "one size fits all".
But there are also the client side: OCI can buffer LOBs client side, so small read/write operations are cached, minimizing the number of network roundtrips and LOB versioning - that's up to the OCI wrapper you're using.
Array DML (only available with FireDac, ODAC, DOA and our SynDbOracle unit afaik) won't change much if your problem is about blob transfer.
First idea is to compress the data before transmission.
Try several access libraries. Our open source SynDBOracle directly accesses the oci.dll client but may be slightly faster.
But perhaps the problem may be on the server side. Oracle does not like transactions with huge data, since it tends to overflow its wal files. Try to tune the write ahead log files of the table.
IMHO a rdbms is not the best option to store huge blobs. Plain files, indexed via a rdbms for metadata is usually better. Or switch to a big SQL storage, like key/value stores or mongodb blob api.
Remember that both Oracle and mssql do ask money proportional to the data size....

Sql Server Backup Sizes

Does anyone have real world experience of backup compression in Sql 2008, I'd like to know how it performs compared to Sql 2005 + winrar. Currently when I back up a I get a 14G file. Using rar shrinks it to <400M, a massive saving. Most of the data is similar, being figures from the stock market so I guess that makes it easy to compress.
The figures I have seen suggest compression ratios in the range 3 - 4.5 times smaller, which is not as good as the figures you quoted for rar. See Tuning the Performance of Backup Compression in SQL Server 2008.
On a side note, creating compressed backups is faster.
Time to restore is an important thing for backups. If you compress your backups, you will spent more time to restore it.
I've got one that goes from 200GB down to 50GB, definitely not as compact as WinRar is getting you, but a lot more convenient. As others have noted, backup time is quicker, but restore time is (a bit) longer but still way quicker than SQL2000.

Using SQL Server as Image store

Is SQL Server 2008 a good option to use as an image store for an e-commerce website? It would be used to store product images of various sizes and angles. A web server would output those images, reading the table by a clustered ID. The total image size would be around 10 GB, but will need to scale. I see a lot of benefits over using the file system, but I am worried that SQL server, not having an O(1) lookup, is not the best solution, given that the site has a lot of traffic. Would that even be a bottle-neck? What are some thoughts, or perhaps other options?
10 Gb is not quite a huge amount of data, so you can probably use the database to store it and have no big issues, but of course it's best performance wise to use the filesystem, and safety-management wise it's better to use the DB (backups and consistency).
Happily, Sql Server 2008 allows you to have your cake and eat it too, with:
The FILESTREAM Attribute
In SQL Server 2008, you can apply the FILESTREAM attribute to a varbinary column, and SQL Server then stores the data for that column on the local NTFS file system. Storing the data on the file system brings two key benefits:
Performance matches the streaming performance of the file system.
BLOB size is limited only by the file system volume size.
However, the column can be managed just like any other BLOB column in SQL Server, so administrators can use the manageability and security capabilities of SQL Server to integrate BLOB data management with the rest of the data in the relational database—without needing to manage the file system data separately.
Defining the data as a FILESTREAM column in SQL Server also ensures data-level consistency between the relational data in the database and the unstructured data that is physically stored on the file system. A FILESTREAM column behaves exactly the same as a BLOB column, which means full integration of maintenance operations such as backup and restore, complete integration with the SQL Server security model, and full-transaction support.
Application developers can work with FILESTREAM data through one of two programming models; they can use Transact-SQL to access and manipulate the data just like standard BLOB columns, or they can use the Win32 streaming APIs with Transact-SQL transactional semantics to ensure consistency, which means that they can use standard Win32 read/write calls to FILESTREAM BLOBs as they would if interacting with files on the file system.
In SQL Server 2008, FILESTREAM columns can only store data on local disk volumes, and some features such as transparent encryption and table-valued parameters are not supported for FILESTREAM columns. Additionally, you cannot use tables that contain FILESTREAM columns in database snapshots or database mirroring sessions, although log shipping is supported.
Check out this white paper from MS Research (http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2006-45)
They detail exactly what you're looking for. The short version is that any file size over 1 MB starts to degrade performance compared to saving the data on the file system.
I doubt that O(log n) for lookups would be a problem. You say you have 10GB of images. Assuming an average image size of say 50KB, that's 200,000 images. Doing an indexed lookup in a table for 200K rows is not a problem. It would be small compared to the time needed to actually read the image from disk and transfer it through your app and to the client.
It's still worth considering the usual pros and cons of storing images in a database versus storing paths in the database to files on the filesystem. For example:
Images in the database obey transaction isolation, automatically delete when the row is deleted, etc.
Database with 10GB of images is of course larger than a database storing only pathnames to image files. Backup speed and other factors are relevant.
You need to set MIME headers on the response when you serve an image from a database, through an application.
The images on a filesystem are more easily cached by the web server (e.g. Apache mod_mmap), or could be served by leaner web server like lighttpd. This is actually a pretty big benefit.
For something like an e-commerce web site, I would be moe likely to go with storing the image in a blob store on the database. While you don't want to engage in premature optimization, just the benefit of having my images be easily organized alongside my data, as well as very portable, is one automatic benefit for something like ecommerce.
If the images are indexed then lookup won't be a big problem. I'm not sure but I don't think the lookup for file system is O(1), more like O(n) (I don't think the files are indexed by the file system).
What worries me in this setup is the size of the database, but if managed correctly that won't be a big problem, and a big advantage is that you have only one thing to backup (the database) and not worry about files on disk.
Normally a good solution is to store the images themselves on the filesystem, and the metadata (file name, dimensions, last updated time, anything else you need) in the database.
Having said that, there's no "correct" solution to this.

DB2 Online Database Backup

I have currently 200+ GB database that is using the DB2 built in backup to do a daily backup (and hopefully not restore - lol) But since that backup now takes more than 2.5 hours to complete I am looking into a Third party Backup and Restore utility. The version is 8.2 FP 14 But I will be moving soon to 9.1 and I also have some 9.5 databases to backup and restore. What are the best tools that you have used for this purpose?
Thanks!
One thing that will help is going to DB2 version 9 and turn on compression. The size of the backup will then decrease (by up to 70-80% on table level) which should shorten the backup time. Of course, if your database is continuosly growing you'll soon run into problems again, but then data archiving might be the thing for you.
Before looking at third party tools, which I doubt would help too much, I would consider a few optimizations.
1) Have you used REORG on your tables and indexes? This would compact the information and minimize the amount of pages used;
2) If you can, backup on multiple disks at the same time. This can easily be achieved by running db2 backup db mydb /mnt/disk1 /mnt/disk2 /mnt/disk3 ...
3) DB2 should do a good job at fine tuning itself, but you can always experiment with the WITH num_buffers BUFFERS, BUFFER buffer-size and PARALLELISM n options. But again, usually DB2 does a better job on its own;
4) Consider performing daily incremental backups, and a full backup once on Saturdays or Sundays;
5) UTIL_IMPACT_PRIORITY and UTIL_IMPACT_LIM let you throttle the backup process so that it doesn't affect your regular workload too much. This is useful if your main concern is not the time per se, but rather the performance of your datasever while you backup;
6) DB2 9's data compression can truly do wonders when it comes to reducing the dimensions of the data that needs to be backed up. I have seen very impressive results and would highly recommend it if you can migrate to version 9.1 or, even better, 9.5.
There really are only two ways to make backup, and more important recovery, run faster:
1. backup less data and/or
2. have a bigger pipe to the backup media
I think you got a lot of suggestions on how to reduce the amount of data that you back up. Basically, you should be creating a backup strategy that relies on relatively infrequent full backup and much more frequent backups of changed (since last full backup) data. I encourage you to take a look at the "Configure Automatic Maintenance" wizard in the DB2 Control Center. It will help you with creating automatic backups and with other utilities like REORG that Antonio suggested. Things like compression obviously can help as the amount of data is much lower. However, not all DB2 editions offer compression. For example, DB2 Express-C does not. Frankly, doing compression on a 200GB database may not be worth while anyway and that is precisely why free DBMS like DB2 Express-C don't offer compression.
As far as openign a bigger pipe for your backup you first have to decide if you are going to backup to disk or to tape. There is a big difference in speed (obviously disk is a lot faster). Second, DB2 can paralelize backups. So, if you have multiple devices to back to, it will backup to all of them at the same time i.e. your elapsed time will be a lot less depending how many devices you have to throw at the problem. Again, DB2 Control Center can help you have it set up.
Try High Performance Unload (HPU) - this was a standalone product from Infotel is now available as part of the Optim data studio - posting here https://www.ibm.com/developerworks/mydeveloperworks/blogs/idm/date/200910?lang=en
It's not a "third-party" product but anyone that I have ever seen using DB2 is using Tivoli Storage Manager to store their database backups.
Most shops will set up archive logging to TSM so you only have to take the "big" backup every week or so.
Since it's also an IBM product you won't have to worry about it working with all the different flavors of DB2 that you have.
The downside is it's an IBM product. :) Not sure if that ($) makes a difference to you.
I doubt that you can speed things up using another backup tool. As Mike mentions, you can add TSM to the stack, but that will hardly make the backup run any faster.
If I were you, I'd look into where backup files are stored: Are they using the same disk spindles as the database itself? If so: See if you can store the backup files on a storage area which isn't contented for access during your backup window.
And consider using incremental backups for daily backups, and then a long full backup on Saturdays.
(I assume that you are already running online backups, so that your data aren't unavailable during backup.)
A third party backup package probably won't help your speed much. Making sure that you are not doing full backups every 2 hours is probably the first step.
After that, look at where you are writing your backup to. Is it a local drive, instead of a network drive? Are the spindles used for anything else? Backups don't involve a lot of seek activity, but do involve a lot of big writes, so you probably want to avoid RAID 5 and go for large stripe sizes to help maximize throughput.
Naturally, you have to do full backups sooner or later, but hopefully you can find a window when load is light and you can live with a longer time period between backups. Do your full backup during a 4-6 hour period when the normal incrementals are off and then do incrementals based off of that the rest of the time.
Also, until you get your backup copied to a completely separate system you really aren't backed up. You'll have to experiment to figure out if you're better off compressing it before, during or after sending.

Resources