Effect of Network Performance on Database (RDS) - database

I would like to know the effect of Network Performance (also the exact name for the metric used by AWS RDS instance types) on a DB.
I am loading Graph database with data at scale using parallel processing (multiple Pods in Kubernetes).
I noticed that by simply changing from one RDS instance type to one more powerful, and monitoring the DB metrics in AWS console, performance is doubled.
Performance figures that are doubled are:
VolumeWriteIOPs - doubled
Network Throughput - doubled
VolumeReadIOPs - tripled
As the better instance type has more CPU, RAM, Disk, and possibly network performance (i believe there is an 'invisible' network performance tiering that is not shown in the instance specs), I suppose my question really is -- if there is a (hypothetical) instance with same CPU, same RAM, same disk performance, what difference does network performance alone make to a DB?
Do DBs and RDS DBs process everything slower if the network performance is lower?
Or does it respond at the same speed, but only serve less connections (making the others wait)?
In my use case they are Kubernetes Pods which are writing to the DB, so does it serve each Pod more slowly, or is it non-responsive above a certain point?

Related

ec2 as bastion host and its performance

I am having a Redshift in private subnet and an ec2 instance in Public subnet which is just a bastion host for my Redshift. All is working well and I can actually connect to my Redshift through internet (SSH).
Now, I want redo the task in the production environment and I want to choose an ec2 instance (nano, micro etc). I am having a doubt, whether my ec2 instance performance depends on the query data transfer size. That is, lets say my redshift returns a huge amount of data for a query and will ec2 throttle the performance?
Basically, I don't want my ec2 to be a performance bottleneck and I am not sure will it be. Any thoughts?
Thanks in advance!
Firstly, you can change the Instance Type of an Amazon EC2 instance at any time. Just stop the instance, change the Instance Type and start it again. So, start with t2.nano and make it bigger if you find any performance problems.
Secondly, your use-case will consume very little RAM and very little CPU. You can look at Amazon CloudWatch metrics to monitor CPU utilization and you can use operating system tools to monitor memory (or use Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances).
Bottom line: Measure and monitor your existing environment and the production environment. Change Instance Type as necessary. Don't sweat it.

AWS NVMe Storage - hosting a database

Now that AWS are offering NVMe through the i3 range of servers, is there a best practice for hosting a database on the instance storage of one of these?
My understanding is that if the instance is stopped, the storage may be completely wiped. This doesn't appear to be the case if the server reboots, intentionally or unintentionally, but you are still one button press away from wiping important data so this is quite scary.
My understanding of the underlying infrastructure is that this is because the NVMe storage is directly attached to the physical host, and therefore if Amazon decide to move your VM to another host you would lose your data. Also it would be bad to store mission-critical data on a single hardware device AWS aside.
But given the performance benefits of NVMe over EBS (SAN?) storage, what would a recommended setup be? VM Replicas, transaction log backups to permanent storage, etc.
It is possible to turn the NVMe SSDs on i3 instances into persistent highly available storage.
Options:
1) Mirroring between NVMe SSDs on 2 or 3 instances
2) Mirroring between NVMe SSDs and EBS (EBS can be on a different instance) with reads primarily from NVMe SSDs.
While write performance will still be limited by network or EBS, you do get full read performance of NVMe. In most cases read bandwidth is what large databases really need for running heavy queries.
However, there are still questions about failing over the database between the instances and restoring redundancy after instance stop/start of failure.
Check this whitepaper and page 9 specifically for details about how it is done for Oracle database clusters:
https://www.flashgrid.io/wp-content/sideuploads/resources/FlashGrid_OracleRAC_on_AWS.pdf
The paper is focused on Oracle RAC databases, but same solution works for single-instance Oracle and also for any other Linux-based database. Although, you would still need Oracle Clusterware (free).

Fastest throughput Local, RAM Cahced DB

I'm looking for a DB solution for a high performance application.
The database will need to be local and stored in RAM for performance and will be several GB in size.
It will be local to the application, but it may be accessed by multiple processes running on the machine (up to 40). The data in the DB is immutable once it's been inserted and I only need a basic key value store rather than anything relational.
The obvious candidates are Memcached and Redis, but I believe they both have limitations with overhead and bottlenecks from the network component.
Something like Berkeley DB would also appear to be ideal, but it's only single process as far as I can see.
Throughput is the most important consideration (more so than latency).

Buffering question in Microsoft SQL Server

I'd like to know how to configure Microsoft SQL Server to work in the following manner:
All db writes are "write behind", all of the queries operate primarily out of the RAM cache (for speed), i.e. it persists the data to the hard drive at its leisure, in the background.
The reason? Speed. We assume 99.99% reliability of the underlying machine (its an Amazon EC2 instance), so we don't mind caching all of the data in RAM (and even if there is a failure we can just rebuild the database ourselves anyway from 3rd party data sources).
For example:
User 1 writes data packet X to the database.
User 2 queries this same data packet X, 2ms later.
User 2 should see data packet X, as SQL will serve it straight out of its RAM cache (even if data packet X hasn't been persisted to the hard drive).
Data packet X will be persisted to the hard drive at leisure, maybe 500ms later.
If you have large amounts of memory and have set a high Min Memory setting in your SQL Server instance than SQL will attempt to maximize its use.
The Checkpoint process is the thing which forces the dirty pages to be written to disk (which happens automatically but can be forced) so you might want to have a read of the following.
http://msdn.microsoft.com/en-us/library/ms188748.aspx
This subject is quite involved and can be effected by the hardware and solution you are using. For instance Virtualization brings a whole ream of other considerations.

What's the best storage for SQL Server 2000?

I want to access my sql server database files in a INTEL SS4000-E storage. It´s a NAS Storage. Could it be possible to work with it as a storage for sql server 2000? If not, what is the best solution?
I strongly recommend against it.
Put your data files locally on the server itself, with RAID mirrored drives. The reasons are twofold:
SQL Server will run much faster for all but the smallest workloads
SQL Server will be much less prone to corruption in case the link to the NAS gets broken.
Use the NAS to store backups of your SQL Server, not to host your datafiles. I don't know what your database size will be, or what your usage pattern will be, so I can't tell you what you MUST have. At a minimum for a database that's going to take any significant load in a production environment, I would recommend two logical drives (one for data, one for your transaction log), each consisting of a RAID 1 array of the fastest drives you can stomach to buy. If that's overkill, put your database on just two physical drives, (one for the transaction log, and one for data). If even THAT is over budget, put your data on a single drive, back up often. But if you choose the single-drive or NAS solution, IMO you are putting your faith in the Power of Prayer (which may not be a bad thing, it just isn't that effective when designing databases).
Note that a NAS is not the same thing as a SAN (on which people typically DO put database files). A NAS typically is much slower and has much less bandwidth than a SAN connection, which is designed for very high reliability, high speed, advanced management, and low latency. A NAS is geared more toward reducing your cost of network storage.
My gut reaction - I think you're mad risking your data on a NAS. SQL's expectation is continuous low-latency uninterrupted access to your storage subsystem. The NAS is almost certainly none of those things - you local or SAN storage (in order of performance, simplicity and therefore preference) - leave the NAS for offline file storage/backups.
The following KB lists some of the constraints and issues you'd encounter trying to use a NAS with SQL - while the KB covers SQL 7 through 2005, a lot of the information still applies to SQL 2008 too.
http://support.microsoft.com/kb/304261
local is almost always faster than networked storage.
Your performance for sql will depend on how your objects, files, and filegroups are defined, and how consumers use the data.
Well "best" means different things to different people, but I think "best" performance would be a TMS RAMSAN or a RAID of SSDs... etc
Best capacity would be achieved with a RAID of large HDDs...
Best reliability/data saftey would be achieved with Mirroring across many drives, and regular backups (off site preferably)...
Best availability... I don't know... maybe a clone the system and have a hot backup ready to go at all times.
Best security would require encryption, but mainly limiting physical access to the machine (and it's backups) is enough unless it's internet connected.
As the other answers point out, there will be a performance penalty here.
It is also worth mentioning that these things sometimes implement a RAM cache to improve I/O performance, if that is the case and you do trial this config, the NAS should be on the same power protection / UPS as the server hardware, otherwise in case of power outtage the NAS may 'loose' the part of the file in cache. ouch!
It can work but a dedicated fiber attached SAN will be better.
Local will usually be faster but it has limited size and won't scale easily.
I'm not familiar with the hardware but we initially deployed a warehouse on a shared NAS. Here's what we found.
We were regularly competing for resources on the head unit -- there was only so much bandwidth that it could handle. Massive warehouse queries and data loads were severely impacted.
We needed 1.5 TB for our warehouse (data/indexes/logs) we put each of these resources onto a separate set of LUNS (like you might do with attached storage). Data was spanning just 10 disks. We ran into all sorts of IO bottlenecks with this. the better solution was to create one big partition across lots of small disks and store data, index and logs all in the same place. This sped things up considerably.
If you're dealing with a moderately used OLTP system, you might be fine but a NAS can be troublesome.

Resources