Is it true that having an SSD persistent storage disk instead of an ordinary HDD storage disk on a server improves the performance of Microsoft SQL Server FILESTREAM operations like streaming a video stored in the database?
If so what is the difference in performance of the FILESTREAM operation and is it worth spending the extra money on SSD on a server ?
There is a dramatic performance increase. The amount of difference depends on many other considerations from the size of the files to the other hardware being utilized. How much traffic you will have at peak and on average.
Is it worth it? That is a question only you can answer. Many applications have been built and perform well before SSD became available. Many continue to be built and perform well using HDD even now that SSD is relatively affordable.
I found the following to be a good article on the topic:
An HDD might be the right choice if:
You need lots of storage capacity, up to 6TB (though with SMR
technology new drives can have up to 10TB) Don’t want to spend much
money Don’t care too much about how fast a computer boots up or opens
programs - then get a hard drive (HDD). An SSD might be the right
choice if:
You are willing to pay for faster performance Don’t mind limited
storage capacity or can work around that (Again, SSDs are working on
this “con”)
http://www.storagereview.com/ssd_vs_hdd
Related
Here's my question:
Database usually saves data into disk, may it be SQL or NoSQL database this is the most common. However in a cloud environment, machines typically are shipped with just an ample amount of storage, and as the application is used this can get used up and will become a problem, and although vertically scaling the storage (adding more disks, e.g. mounts) I understand in my case that vertical scaling is not a long term solution.
What is the best scale out solution for databases?
Given for example when ordering cloud machines you get typically just enough disk for each machine, 50GB for instance.
So if we're targeting 1TB minimum capacity for we will need to run like 20 machines? for 10TB capacity like ten times more machine?
How do you from day one make use of a scalable database without worrying on running out of disk space? (I mean given that you can spin more machines, if needed, using a dashboard from a cloud provider)
Databases usually are a storage for most applications. Our company also makes a lot of calculations and data manipulations with that data on daily basis.
As soon as we get more and more data, data generation became an issue cause takes too long. And I think it can make sense to separate database to at least two :
for storing data with focus on read/write performance;
for calculations with focus on data aggregation performance.
Does anybody has similar experience and can tell if this idea is good and what will be design differences for mentioned two points?
Maybe it is worth to look for noSQL solution for calculating data e.g. in-memory databases?
it can make sense to separate database to at least two
If the databases are in different Disks (with different spindles ), it may help otherwise you get no gain because disk IO is shared between these databases.
For best practice,read Storage Top 10 Best Practices
Maybe it is worth to look for noSQL solution for calculating data e.g. in-memory databases?
No need to go to noSQL solution, you can use in-memory tables
In-Memory OLTP can significantly improve the performance of transaction processing, data load and transient data scenarios.
For more details, In-Memory OLTP (In-Memory Optimization)
Other Strategies
1) Tune tempdb
Tempdb is common for all databases and heavily used in calculations.
A more pragmatic approach, is to have a 1:1 mapping between files and logical CPUs(cores) up to eight.
for more details: SQL Server TempDB Usage, Performance, and Tuning Tips
2) Evaluate life expectancy (PLE) Counter and take actions for enhancement
To evaluate data cache, run the following query
SELECT [object_name],
[counter_name],
[cntr_value] FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'
The recommended value of the PLE counter (in seconds ) is greater than:
total_memory_dedicated_for_sql_server / 4 * 300
Page Life Expectancy is the number of seconds a page will stay in the buffer pool without references. In simple words, if your page stays longer in the buffer pool (area of the memory cache) your PLE is higher, leading to higher performance as every time request comes there are chances it may find its data in the cache itself instead of going to the hard drive to read the data.
If PLE is't enough Increase memory and tune indexes and statistics.
3) Use SSD disks
With the cost of solid state disks (SSDs) going down, use the SSDs as a second tier of cache.
4) Use RAID 5 for the databases; and RAID 10 for the transaction logs and tempdb.
In general, the SQL optimizer game is moving data from disk (low speed) to cache (memory- high speed).
Increase memory and enhance diskIo speed, you gain high performance
I have been monitoring the performance of an OLTP database (approx. 150GB); the average disk sec/read and average disk sec/write values are exceeding 20 ms over a 24hr period.
I need to arrive at a clear explanation as to why the business application has no influence over the 'less-than-stellar' performance on these counters. I also need to exert some pressure to have the storage folk re-examine their configuration as it applies to the placement of the mdf, ldf and tempdb files on their SAN. At present, my argument is shaky but I am pressing my point with people who don't understand the difference between IOPs and disk latency.
Beyond the limitations of physical hardware and the placement of data files across physical disks, is there anything else that would influence these counter values? For instance: the number of transactions per second, the size of the query, poorly written queries or missing indexes? My readings say 'no' but I need a voice of authority in this debate.
There are "a lot" of factors that can affect the overall latency. To truly rule it as SAN or not, you will want to look at the "Avg. Disk sec/Read counter" and the "Avg. Desk sec/Write Counter", that you mentioned. Just make sure you are looking at the "Physical Disk" object, and not the "Logical Disk" object. The logical disk counter includes the file system overhead, and may be different, depending on different factors.
Once you have the counters for the physical disks, you will want to compare them to the latency counters for the Storage unit, the server is connected to. You mentioned "storage folk" So I'm going to assume that is a different team, hopefully they will be nice and provide the info to you.
If it is a Storage unit issue, then both of these counters should match up pretty good. That indicates the storage unit is truly running slow. If the storage unit counters show significantly better, then it's something in between. Depending on what type of storage network you are using this would be the HBA/NIC/Switches that connect the server and storage together. Or if it's a VM then the host machine stats would prove useful as well.
Apart from obvious reasons such as "not enough memory for buffer pool", latency mostly depends on how your storage is actually implemented.
If your server has external SAN, usually its problem is that it might give you stellar throughput, but it will never (again, usually) give you stellar latency. It's just the way things are. It might become a real headache for heavy loaded OLTP systems, sure.
So, if you are about to squeeze every last microsecond from your storage, most probably you will need local drives. That, and your RAID 10 should have enough spindles to cope with the load.
Will the performance of a SQL server drastically degrade if the database is bigger than the RAM? Or does only the index have to fit in the memory? I know this is complex, but as a rule of thumb?
Only the working set or common data or currently used data needs to fit into the buffer cache (aka data cache). This includes indexes too.
There is also the plan cache, network buffers + other stuff too. MS have put a lot of work into memory management on SQL Server and it's works well, IMHO.
Generally, more RAM will help but it's not essential.
Yes, when indexes cant fit in the memory or when doing full table scans. Doing aggregate functions over data not in memory will also require many (and maybe random) disc reads.
For some benchmarks:
Query time will depend significantly
on whether the affected data currently
resides in memory or disk access is
required. For disk intensive
operations, the characteristics of the
disk sequential and random I/O
performance are also important.
http://www.sql-server-performance.com/articles/per/large_data_operations_p7.aspx
There for, don't expect the same performance if your db size > ram size.
Edit:
http://highscalability.com/ is full of examples like:
Once the database doesn't fit in RAM you hit a wall.
http://highscalability.com/blog/2010/5/3/mocospace-architecture-3-billion-mobile-page-views-a-month.html
Or here:
Even if the DB size is just 10% bigger than RAM size this test shows a 2.6 times drop in performance.
http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
Although, remember that this is for hot data, data that you want to query over and don't can cache. If you can, you can easily live with significant less memory.
All DB operations will have to be backed up by writing to disk, having more RAM is helpful, but not essential.
Loading the whole database into RAM is not practical. Database can be upto a Terabytes these days. There is little chance that anyone would buy so much RAM. I think performance will be optimal even if the size of the RAM available is one tenth of the size of the database.
I"m looking to run PostgreSQL in RAM for performance enhancement. The database isn't more than 1GB and shouldn't ever grow to more than 5GB. Is it worth doing? Are there any benchmarks out there? Is it buggy?
My second major concern is: How easy is it to back things up when it's running purely in RAM. Is this just like using RAM as tier 1 HD, or is it much more complicated?
It might be worth it if your database is I/O bound. If it's CPU-bound, a RAM drive will make no difference.
But first things first, you should make sure that your database is properly tuned, you can get huge performance gains that way without losing any guarantees. Even a RAM-based database will perform badly if it's not properly tuned. See PostgreSQL wiki on this, mainly shared_buffers, effective_cache_size, checkpoint_*, default_statistics_target
Second, if you want to avoid synchronizing disk buffers on every commit (like codeka explained in his comment), disable the synchronous_commit configuration option. When your machine loses power, this will lose some latest transactions, but your database will still be 100% consistent. In this mode, RAM will be used to buffer all writes, including writes to the transaction log. So with very rare checkpoints, large shared_buffers and wal_buffers, it can actually approach speeds close to those of a RAM-drive.
Also hardware can make a huge difference. 15000 RPM disks can, in practice, be 3x as fast as cheap drives for database workloads. RAID controllers with battery-backed cache also make a significant difference.
If that's still not enough, then it may make sense to consider turning to volatile storage.
The whole thing about whether to hold you database in memory depends on size and performance as well how robust you want it to be with writes. I assume you are writing to your database and that you want to persist the data in case of failure.
Personally, I would not worry about this optimization until I ran into performance issues. It just seems risky to me.
If you are doing a lot of reads and very few writes a cache might serve your purpose, Many ORMs come with one or more caching mechanisms.
From a performance point of view, clustering across a network to another DBMS that does all the disk writing, seems a lot more inefficient than just having a regular DBMS and having it tuned to keep as much as possible in RAM as you want.
Actually... as long as you have enough memory available your database will already be fully running in RAM. Your filesystem will completely buffer all the data so it won't make much of a difference.
But... there is ofcourse always a bit of overhead so you can still try and run it all from a ramdrive.
As for the backups, that's just like any other database. You could use the normal Postgres dump utilities to backup the system. Or even better, let it replicate to another server as a backup.
5 to 40 times faster than disk resident DBMS. Check out Gartner's Magic Quadrant for Operational DBMSs 2013.
Gartner shows who is strong and more importantly notes severe cautions...bugs. .errors...lack of support and hard to use of vendors.