Now that AWS are offering NVMe through the i3 range of servers, is there a best practice for hosting a database on the instance storage of one of these?
My understanding is that if the instance is stopped, the storage may be completely wiped. This doesn't appear to be the case if the server reboots, intentionally or unintentionally, but you are still one button press away from wiping important data so this is quite scary.
My understanding of the underlying infrastructure is that this is because the NVMe storage is directly attached to the physical host, and therefore if Amazon decide to move your VM to another host you would lose your data. Also it would be bad to store mission-critical data on a single hardware device AWS aside.
But given the performance benefits of NVMe over EBS (SAN?) storage, what would a recommended setup be? VM Replicas, transaction log backups to permanent storage, etc.
It is possible to turn the NVMe SSDs on i3 instances into persistent highly available storage.
Options:
1) Mirroring between NVMe SSDs on 2 or 3 instances
2) Mirroring between NVMe SSDs and EBS (EBS can be on a different instance) with reads primarily from NVMe SSDs.
While write performance will still be limited by network or EBS, you do get full read performance of NVMe. In most cases read bandwidth is what large databases really need for running heavy queries.
However, there are still questions about failing over the database between the instances and restoring redundancy after instance stop/start of failure.
Check this whitepaper and page 9 specifically for details about how it is done for Oracle database clusters:
https://www.flashgrid.io/wp-content/sideuploads/resources/FlashGrid_OracleRAC_on_AWS.pdf
The paper is focused on Oracle RAC databases, but same solution works for single-instance Oracle and also for any other Linux-based database. Although, you would still need Oracle Clusterware (free).
Related
I would like to know the effect of Network Performance (also the exact name for the metric used by AWS RDS instance types) on a DB.
I am loading Graph database with data at scale using parallel processing (multiple Pods in Kubernetes).
I noticed that by simply changing from one RDS instance type to one more powerful, and monitoring the DB metrics in AWS console, performance is doubled.
Performance figures that are doubled are:
VolumeWriteIOPs - doubled
Network Throughput - doubled
VolumeReadIOPs - tripled
As the better instance type has more CPU, RAM, Disk, and possibly network performance (i believe there is an 'invisible' network performance tiering that is not shown in the instance specs), I suppose my question really is -- if there is a (hypothetical) instance with same CPU, same RAM, same disk performance, what difference does network performance alone make to a DB?
Do DBs and RDS DBs process everything slower if the network performance is lower?
Or does it respond at the same speed, but only serve less connections (making the others wait)?
In my use case they are Kubernetes Pods which are writing to the DB, so does it serve each Pod more slowly, or is it non-responsive above a certain point?
According to this article and several others I've found:
Performance best practices for SQL Server
It is considered a best practice to disable caching on premium storage disks for the SQL Server Log disks. However, I can't find anywhere that explains why.
Does anyone have some insight?
Let me add that the reason I see disabling read-only cache on the log drive as an issue is because it makes you have to set up two separate Storage Pools inside the VM, which makes upgrading/downgrading VM's inside Azure more problematic and considerably less performant.
For example, say you start with a DS V13 which has a 16 drive limit, but about 6 of those drives can be maxed before you're throttled (25000 IOPs). Since best practices says read-only cache for data and no cache for logs, you give 8 of those drives to the data and 8 to the log.
Now, the server needs to be upgraded, so you upgrade it to a DS V14. Now you can max out 12 drives before being throttled (50000 IOPs). However, your data drive's Storage Spaces column size is only 8, which is throttled to 40000 IOPs. So you're not using the IO's full potential.
However, if you can start with a DS V13 and assign all 16 of those drives to a single Storage Pool then put both the log and data on it. You can upgrade/downgrade all the way up to a DS V15 without any concern for not using up your full IOP's potential.
Another way to put it is: If you create a single Storage Pool for all 16 drives, you have considerably more flexibility in upgrading/downgrading the VM. If you have to create two Storage Pools, you do not.
We recommend configuring “None” cache on premium storage disks hosting the log files.. Log files have primarily write-heavy operations and they do not benefit from the ReadOnly cache. Two reasons:
Legitimate Cache contents will get evicted by useless data from the log.
Log writes will also consume Cache BW/IOPS. If the cache is enabled on a disk (ReadOnly or ReadWrite), every write on that disk will also write that data into the cache. Every read will also access/put the data in the cache. Thus every IO will hit cache if the cache is ON.
Thanks,
Aung
Logs files are used as part of the recovery, and can help restore a database to a point in time. Having corrupt data on a log file from a power outage or hard reboot is not good with MSSQL. See the below article from MS, they relate to older versions of SQL but the purpose of the log file has not changed.
Information about using disk drive caches with SQL Server that every database administrator should know
https://support.microsoft.com/en-us/kb/234656
Description of caching disk controllers in SQL Server
https://support.microsoft.com/en-us/kb/86903
We're migrating our environment over to AWS from a colo facility. As part of that we are upgrading our 2 SQL Server 2005s to 2014s. The two are currently mirrored and we'd like to keep it that way or find other ways to make the servers redundant. # of transactions/server-use is light for our app - but it's in production, requires high availability, and, as a result, requires some kind of fail over.
We have already setup one EC2 instance and put SQL server 2014 on it (as opposed to using RDBMS for licensing reasons and are now exploring what to do next to achieve this.
What suggestions do people have to achieve the redundancy we need?
I've seen two options thus far from here and googling around. I list them below - we're very open to other options!
First, use RDBMS mirroring service, but I can't tell if that only applies if the principal server is also RDBMS - it also doesn't help with licensing.
Second, use multiple availability zones. What are the pros/cons of this versus using different regions altogether (e.g., bandwidth issues) etc? And does multi-AZ actually give redundancy (if AWS goes down in Oregon, for example, then doesn't everything go down)?
Thanks for the help!
The Multi-AZ capability of Amazon RDS (Relational Database Service) is designed to offer high-availability for a database.
From Amazon RDS Multi-AZ Deployments:
When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure (for example, instance hardware failure, storage failure, or network disruption), Amazon RDS performs an automatic failover to the standby, so that you can resume database operations as soon as the failover is complete. Since the endpoint for your DB Instance remains the same after a failover, your application can resume database operation without the need for manual administrative intervention.
Multiple Availability Zones are recommended to improve availability of systems. Each AZ is a separate physical facility such that any disaster that should befall one AZ should not impact another AZ. This is normally considered sufficient redundancy rather than having to run across multiple Regions. It also has the benefit that data can be synchronously replicated between AZs due to low-latency connections, while this might not be possible between Regions since they are located further apart.
One final benefit... The Multi-AZ capability of Amazon RDS can be activated by simply selecting "Yes" when the database is launched. Running your own database and using mirroring services requires you to do considerably more work on an on-going basis.
I'm wondering if Amazon EC2+EBS can handle large Oracle databases (7TB to start with). Sounds like EBS can have storage volumes of up to 1TB and I could have many storage volumes attached to the same EC2 instance, but is it possible then to configure Oracle to use those storage volumes so that the database can grow to 7TB and beyond?
To pursue this I would bring in Oracle DBAs to assist, but I want to figure out if this is even a valid approach, or should we look elsewhere?
What other options are there for large (7-15 TB) databases in the cloud?
Yes, you can. But it can be painful. For instances this size you want: tape backup, fast storage, and, most importantly Automatic Storage Management(ASM).
When using ASM: You can run the oracle processes in a cloud, not the storage. Is not possible to really use ASM in the cloud, it uses specific hardware instructions to make storage fast, the VM would get in the way and make it too slow.
Running oracle without ASM for 5TB+ of data is not practical.
Important:
If you have 5TB of DATA, you need AT LEAST, 13TB of disk space to run a HA oracle instance.
In my company we run a 15TB Oracle in the cloud, but we hired dedicated storage devices. You can't do that with Amazon. (try mediatemple)
I want to access my sql server database files in a INTEL SS4000-E storage. It´s a NAS Storage. Could it be possible to work with it as a storage for sql server 2000? If not, what is the best solution?
I strongly recommend against it.
Put your data files locally on the server itself, with RAID mirrored drives. The reasons are twofold:
SQL Server will run much faster for all but the smallest workloads
SQL Server will be much less prone to corruption in case the link to the NAS gets broken.
Use the NAS to store backups of your SQL Server, not to host your datafiles. I don't know what your database size will be, or what your usage pattern will be, so I can't tell you what you MUST have. At a minimum for a database that's going to take any significant load in a production environment, I would recommend two logical drives (one for data, one for your transaction log), each consisting of a RAID 1 array of the fastest drives you can stomach to buy. If that's overkill, put your database on just two physical drives, (one for the transaction log, and one for data). If even THAT is over budget, put your data on a single drive, back up often. But if you choose the single-drive or NAS solution, IMO you are putting your faith in the Power of Prayer (which may not be a bad thing, it just isn't that effective when designing databases).
Note that a NAS is not the same thing as a SAN (on which people typically DO put database files). A NAS typically is much slower and has much less bandwidth than a SAN connection, which is designed for very high reliability, high speed, advanced management, and low latency. A NAS is geared more toward reducing your cost of network storage.
My gut reaction - I think you're mad risking your data on a NAS. SQL's expectation is continuous low-latency uninterrupted access to your storage subsystem. The NAS is almost certainly none of those things - you local or SAN storage (in order of performance, simplicity and therefore preference) - leave the NAS for offline file storage/backups.
The following KB lists some of the constraints and issues you'd encounter trying to use a NAS with SQL - while the KB covers SQL 7 through 2005, a lot of the information still applies to SQL 2008 too.
http://support.microsoft.com/kb/304261
local is almost always faster than networked storage.
Your performance for sql will depend on how your objects, files, and filegroups are defined, and how consumers use the data.
Well "best" means different things to different people, but I think "best" performance would be a TMS RAMSAN or a RAID of SSDs... etc
Best capacity would be achieved with a RAID of large HDDs...
Best reliability/data saftey would be achieved with Mirroring across many drives, and regular backups (off site preferably)...
Best availability... I don't know... maybe a clone the system and have a hot backup ready to go at all times.
Best security would require encryption, but mainly limiting physical access to the machine (and it's backups) is enough unless it's internet connected.
As the other answers point out, there will be a performance penalty here.
It is also worth mentioning that these things sometimes implement a RAM cache to improve I/O performance, if that is the case and you do trial this config, the NAS should be on the same power protection / UPS as the server hardware, otherwise in case of power outtage the NAS may 'loose' the part of the file in cache. ouch!
It can work but a dedicated fiber attached SAN will be better.
Local will usually be faster but it has limited size and won't scale easily.
I'm not familiar with the hardware but we initially deployed a warehouse on a shared NAS. Here's what we found.
We were regularly competing for resources on the head unit -- there was only so much bandwidth that it could handle. Massive warehouse queries and data loads were severely impacted.
We needed 1.5 TB for our warehouse (data/indexes/logs) we put each of these resources onto a separate set of LUNS (like you might do with attached storage). Data was spanning just 10 disks. We ran into all sorts of IO bottlenecks with this. the better solution was to create one big partition across lots of small disks and store data, index and logs all in the same place. This sped things up considerably.
If you're dealing with a moderately used OLTP system, you might be fine but a NAS can be troublesome.