SQL Server FILESTREAM performance - sql-server

SQL Server FILESTREAM has some known limitations.
1) Database mirroring does not support FILESTREAM.
2) For clustering, FILESTREAM filegroups must be put on a shared disk which defeats the purpose of creating the clusters.
Given these limitations in FILESTREAM is it advisable to build a FILESTREAM solution. Im looking to save and retrieve .5 million files in a FILESTREAM database (approx 1TB of disk size) which would be accessed simultaneously by approx 2000 users. Given the fact that the FILESTREAM cannot be clustered or mirrored how does one devise a scalable solution.
If I live with a non scalable solution what would be the performance of such a system. Can I serve up say 100 users with 100 1 MB files within a 5 second window?

Reality check: Issue 2 is a non-issue. In a cluster ALL data must be on shared discs, otherwise the cluster failover can not access the data. If that defeats the purpose fof a cluster you are invited to install a SQL Server cluster without shared discs. ALL data storage on clusters must be on shared discs. Has been like this since cluster service was first created for windows.
Which basically makes your conclusions already quite - hm - wrong.
You also need to make some mathmatics. 5th grade style.
Can I serve up say 100 users with 100
1 MB files within a 5 second window?
Ignore SQL Server for a moment. Depending on how I read this thi is either 100mb or 10.000mb. Anyhow, 100mb in 5 seconds = 20mb per second, which runs around 200mbit. This is serious traffic. We taalkg of minimum 250 to 300mbit needed external bandwidth.

Related

sharepoint stores files in database 10 times more than the real file size

There is a problem with SharePoint database "WSS_Content_"
I've got a simple document library in my SharePoint site. when I add a file of specific size(e.g 1MB),SharePoint stores the file in .mdf file 10 times of the original(1GB). I got it by checking the file size in AllDocs Table. As a reult, the original database size has grown up to 240 GB from 78 GB.
Also shrinking database couldn't be useful.
Any idea to fix my SharePoint database is greatly appreciated.
You should expect the database to be larger than the sum of your content. The physical database size includes not just content (which itself includes deleted documents/items that haven't been flushed from site recycle bins), but also transaction logs, permissions, table metadata, the database schema, indexes, and any pre-allocated space for future growth.
A database of 240 GB for only 78 GB of content does seem quite large (68% overhead sounds excessive), so you might want to look into defragmentation and shrink operations. You should verify how your SQL Server is configured in terms of pre-allocation of space; this can cause sudden large spikes in storage consumption when SQL decides it needs more storage for future growth (even though it's not consuming it with data just yet).
All that being said, your screenshot suggests that your math is off by a factor of ten; 1457664 bytes is only 1.457664 MB, not very close to 1 GB.
By the way, you can save much more space in database if you turn off versioning, because for one document it stores all its versions sepereately

Best practice in choosing the drives for data and log files while installing sql server in prod environment?

I need to install a SQL server for prod environment. There are only two drives in the system one drive with 120 GB and another with 50 GB. How to choose the drives to keep the user defined db data and log files and temp db files.
Your question too broad to have simple answer.
Take this points into consideration:
What is the size of user data database?
What is expected growth of user data database?
Do you have a lot of queries with #-tables? (tempdb strain)
What is expected transaction count in second/minute?
Use SQLIO to measure speed of your drives.
Do you have ready load tests? (Use them and look in Resource Monitor, check disk queues)
Which recovery model you are using? (Growth of your log files)
Which backup strategy you are planning?
It is equally possible:
you don't have to worry with all DBs on default location
you need faster hardware

SQL server scalability question

We are trying to build an application which will have to store billions of records. 1 trillion+
a single record will contain text data and meta data about the text document.
pl help me understand about the storage limitations. can a databse SQL or oracle support this much data or i have to look for some other filesystem based solution ? What are my options ?
Since the central server has to handle incoming load from many clients, how will parallel insertions and search scale ? how to distribute data over multiple databases or tables ? I am little green to database specifics for such scaled environment.
initally to fill the database the insert load will be high, later as the database grows, search load will increase and inserts will reduce.
the total size of data will cross 1000 TB.
thanks.
1 trillion+
a single record will contain text data
and meta data about the text document.
pl help me understand about the
storage limitations
I hope you have a BIG budget for hardware. This is big as in "millions".
A trillion documents, at 1024 bytes total storage per document (VERY unlikely to be realistic when you say text) is a size of about 950 terabyte of data. Storage limitations means you talk high end SAN here. Using a non-redundant setup of 2tb discs that is 450 discs. Make the maths. Adding redundancy / raid to that and you talk major hardware invesment. An this assumes only 1kb per document. If you have on average 16kg data usage, this is... 7200 2tb discs.
THat is a hardware problem to start with. SQL Server does not scale so high, and you can not do that in a single system anyway. The normal approach for a docuemnt store like this would be a clustered storage system (clustered or somehow distributed file system) plus a central database for the keywords / tagging. Depending on load / inserts possibly with replciations of hte database for distributed search.
Whatever it is going to be, the storage / backup requiments are terrific. Lagre project here, large budget.
IO load is gong to be another issue - hardware wise. You will need a large machine and get a TON of IO bandwidth into it. I have seen 8gb links overloaded on a SQL Server (fed by a HP eva with 190 discs) and I can imagine you will run something similar. You will want hardware with as much ram as technically possible, regardless of the price - unless you store the blobs outside.
SQL row compression may come in VERY handy. Full text search will be a problem.
the total size of data will cross 1000
TB.
No. Seriously. It will be a bigger, I think. 1000tb would assume the documents are small - like the XML form of a travel ticket.
According to the MSDN page on SQL Server limitations, it can accommodate 524,272 terabytes in a single database - although it can only accommodate 16TB per file, so for 1000TB, you'd be looking to implement partitioning. If the files themselves are large, and just going to be treated as blobs of binary, you might also want to look at FILESTREAM, which does actually keep the files on the file system, but maintains SQL Server notions such as Transactions, Backup, etc.
All of the above is for SQL Server. Other products (such as Oracle) should offer similar facilities, but I couldn't list them.
In the SQL Server space you may want to take a look at SQL Server Parallel Data Warehouse, which is designed for 100s TB / Petabyte applications. Teradata, Oracle Exadata, Greenplum, etc also ought to be on your list. In any case you will be needing some expert help to choose and design the solution so you should ask that person the question you are asking here.
When it comes to database its quite tricky and there can be multiple components involved to get performance like Redis Cache, Sharding, Read replicas etc.
Bellow post describes simplified DB scalability.
http://www.cloudometry.in/2015/09/relational-database-scalability-options.html

Help with SQL Server Log, Master and TempDB locations on specific array config

I have a quick question that is specific to the server setup we have available to us.
We are getting ready to migrate our help desk to a new server, as well as convert a small ACT DB to the full version of SQL 2005 Standard (from SQL Expr.)
Right now, we only have the following resources available to us as far as array configurations go.
It is server 2008 64 standard, and we will be using SQL 2005 Standard 64.
2 drives in raid 1+0 for the OS (1)
3 drives in raid 5, (2)
and 3 additional drives to allocate out for additional resources. (3)
My initial plans were to install ACT, our Help Desk and the SQL Program files and transaction log files on (2), and use (3) in raid 0 for the tempDB.
The DB sizes are very small, and honestly we could probably run everything on the first 2 arrarys with minimal performance loss (just because the DB are so small)
However we may decide to dedicate this server to SQL somewhere down the line, moving many more DB's over to it, and remove the help desk (web front end) to another server.
How intensive are the log file write operations for 2 small (<500MB) db's?
How risky is putting the TempDB on a raid 0?
Would moving the log files to the system array (1) improve performance?
With 8 disks available I'd recommend the following independent RAID arrays:
OS: RAID 1 (2 disks) (you specified RAID 10 in your question - you can't do RAID 10 with only two drives).
Database data files (including TempDB data file and log file): RAID 5 (4 disks).
Database log files: RAID 1 (2 disks).
If you think the usage of your databases will increase or you're planning to consolidate further databases you may also consider adding further disks to split out your TempDB database.
If disk space capacity isn't an issue you could consider changing the 4 disk RAID 5 array to RAID 10 which would be more performant (particularly for writes).
How are you planning to do backups?
1. How intensive are the log file write operations for 2 small (<500MB) db's?
This depends on what your application is doing and the number of concurrent connections.
2. How risky is putting the TempDB on a raid 0?
Putting TempDB on RAID 0 is risky, if you lose the TempDB database because one of the disks fails your SQL instance will stop.
3. Would moving the log files to the system array (1) improve performance?
Yes, but putting them on their own independent array would be more performant and resilient.
You should really ask this question on serverfault.com (its content is skewed towards administration rather than programming).

Using SQL Server as Image store

Is SQL Server 2008 a good option to use as an image store for an e-commerce website? It would be used to store product images of various sizes and angles. A web server would output those images, reading the table by a clustered ID. The total image size would be around 10 GB, but will need to scale. I see a lot of benefits over using the file system, but I am worried that SQL server, not having an O(1) lookup, is not the best solution, given that the site has a lot of traffic. Would that even be a bottle-neck? What are some thoughts, or perhaps other options?
10 Gb is not quite a huge amount of data, so you can probably use the database to store it and have no big issues, but of course it's best performance wise to use the filesystem, and safety-management wise it's better to use the DB (backups and consistency).
Happily, Sql Server 2008 allows you to have your cake and eat it too, with:
The FILESTREAM Attribute
In SQL Server 2008, you can apply the FILESTREAM attribute to a varbinary column, and SQL Server then stores the data for that column on the local NTFS file system. Storing the data on the file system brings two key benefits:
Performance matches the streaming performance of the file system.
BLOB size is limited only by the file system volume size.
However, the column can be managed just like any other BLOB column in SQL Server, so administrators can use the manageability and security capabilities of SQL Server to integrate BLOB data management with the rest of the data in the relational database—without needing to manage the file system data separately.
Defining the data as a FILESTREAM column in SQL Server also ensures data-level consistency between the relational data in the database and the unstructured data that is physically stored on the file system. A FILESTREAM column behaves exactly the same as a BLOB column, which means full integration of maintenance operations such as backup and restore, complete integration with the SQL Server security model, and full-transaction support.
Application developers can work with FILESTREAM data through one of two programming models; they can use Transact-SQL to access and manipulate the data just like standard BLOB columns, or they can use the Win32 streaming APIs with Transact-SQL transactional semantics to ensure consistency, which means that they can use standard Win32 read/write calls to FILESTREAM BLOBs as they would if interacting with files on the file system.
In SQL Server 2008, FILESTREAM columns can only store data on local disk volumes, and some features such as transparent encryption and table-valued parameters are not supported for FILESTREAM columns. Additionally, you cannot use tables that contain FILESTREAM columns in database snapshots or database mirroring sessions, although log shipping is supported.
Check out this white paper from MS Research (http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2006-45)
They detail exactly what you're looking for. The short version is that any file size over 1 MB starts to degrade performance compared to saving the data on the file system.
I doubt that O(log n) for lookups would be a problem. You say you have 10GB of images. Assuming an average image size of say 50KB, that's 200,000 images. Doing an indexed lookup in a table for 200K rows is not a problem. It would be small compared to the time needed to actually read the image from disk and transfer it through your app and to the client.
It's still worth considering the usual pros and cons of storing images in a database versus storing paths in the database to files on the filesystem. For example:
Images in the database obey transaction isolation, automatically delete when the row is deleted, etc.
Database with 10GB of images is of course larger than a database storing only pathnames to image files. Backup speed and other factors are relevant.
You need to set MIME headers on the response when you serve an image from a database, through an application.
The images on a filesystem are more easily cached by the web server (e.g. Apache mod_mmap), or could be served by leaner web server like lighttpd. This is actually a pretty big benefit.
For something like an e-commerce web site, I would be moe likely to go with storing the image in a blob store on the database. While you don't want to engage in premature optimization, just the benefit of having my images be easily organized alongside my data, as well as very portable, is one automatic benefit for something like ecommerce.
If the images are indexed then lookup won't be a big problem. I'm not sure but I don't think the lookup for file system is O(1), more like O(n) (I don't think the files are indexed by the file system).
What worries me in this setup is the size of the database, but if managed correctly that won't be a big problem, and a big advantage is that you have only one thing to backup (the database) and not worry about files on disk.
Normally a good solution is to store the images themselves on the filesystem, and the metadata (file name, dimensions, last updated time, anything else you need) in the database.
Having said that, there's no "correct" solution to this.

Resources