I am having a Redshift in private subnet and an ec2 instance in Public subnet which is just a bastion host for my Redshift. All is working well and I can actually connect to my Redshift through internet (SSH).
Now, I want redo the task in the production environment and I want to choose an ec2 instance (nano, micro etc). I am having a doubt, whether my ec2 instance performance depends on the query data transfer size. That is, lets say my redshift returns a huge amount of data for a query and will ec2 throttle the performance?
Basically, I don't want my ec2 to be a performance bottleneck and I am not sure will it be. Any thoughts?
Thanks in advance!
Firstly, you can change the Instance Type of an Amazon EC2 instance at any time. Just stop the instance, change the Instance Type and start it again. So, start with t2.nano and make it bigger if you find any performance problems.
Secondly, your use-case will consume very little RAM and very little CPU. You can look at Amazon CloudWatch metrics to monitor CPU utilization and you can use operating system tools to monitor memory (or use Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances).
Bottom line: Measure and monitor your existing environment and the production environment. Change Instance Type as necessary. Don't sweat it.
Related
I would like to know the effect of Network Performance (also the exact name for the metric used by AWS RDS instance types) on a DB.
I am loading Graph database with data at scale using parallel processing (multiple Pods in Kubernetes).
I noticed that by simply changing from one RDS instance type to one more powerful, and monitoring the DB metrics in AWS console, performance is doubled.
Performance figures that are doubled are:
VolumeWriteIOPs - doubled
Network Throughput - doubled
VolumeReadIOPs - tripled
As the better instance type has more CPU, RAM, Disk, and possibly network performance (i believe there is an 'invisible' network performance tiering that is not shown in the instance specs), I suppose my question really is -- if there is a (hypothetical) instance with same CPU, same RAM, same disk performance, what difference does network performance alone make to a DB?
Do DBs and RDS DBs process everything slower if the network performance is lower?
Or does it respond at the same speed, but only serve less connections (making the others wait)?
In my use case they are Kubernetes Pods which are writing to the DB, so does it serve each Pod more slowly, or is it non-responsive above a certain point?
Now that AWS are offering NVMe through the i3 range of servers, is there a best practice for hosting a database on the instance storage of one of these?
My understanding is that if the instance is stopped, the storage may be completely wiped. This doesn't appear to be the case if the server reboots, intentionally or unintentionally, but you are still one button press away from wiping important data so this is quite scary.
My understanding of the underlying infrastructure is that this is because the NVMe storage is directly attached to the physical host, and therefore if Amazon decide to move your VM to another host you would lose your data. Also it would be bad to store mission-critical data on a single hardware device AWS aside.
But given the performance benefits of NVMe over EBS (SAN?) storage, what would a recommended setup be? VM Replicas, transaction log backups to permanent storage, etc.
It is possible to turn the NVMe SSDs on i3 instances into persistent highly available storage.
Options:
1) Mirroring between NVMe SSDs on 2 or 3 instances
2) Mirroring between NVMe SSDs and EBS (EBS can be on a different instance) with reads primarily from NVMe SSDs.
While write performance will still be limited by network or EBS, you do get full read performance of NVMe. In most cases read bandwidth is what large databases really need for running heavy queries.
However, there are still questions about failing over the database between the instances and restoring redundancy after instance stop/start of failure.
Check this whitepaper and page 9 specifically for details about how it is done for Oracle database clusters:
https://www.flashgrid.io/wp-content/sideuploads/resources/FlashGrid_OracleRAC_on_AWS.pdf
The paper is focused on Oracle RAC databases, but same solution works for single-instance Oracle and also for any other Linux-based database. Although, you would still need Oracle Clusterware (free).
We're migrating our environment over to AWS from a colo facility. As part of that we are upgrading our 2 SQL Server 2005s to 2014s. The two are currently mirrored and we'd like to keep it that way or find other ways to make the servers redundant. # of transactions/server-use is light for our app - but it's in production, requires high availability, and, as a result, requires some kind of fail over.
We have already setup one EC2 instance and put SQL server 2014 on it (as opposed to using RDBMS for licensing reasons and are now exploring what to do next to achieve this.
What suggestions do people have to achieve the redundancy we need?
I've seen two options thus far from here and googling around. I list them below - we're very open to other options!
First, use RDBMS mirroring service, but I can't tell if that only applies if the principal server is also RDBMS - it also doesn't help with licensing.
Second, use multiple availability zones. What are the pros/cons of this versus using different regions altogether (e.g., bandwidth issues) etc? And does multi-AZ actually give redundancy (if AWS goes down in Oregon, for example, then doesn't everything go down)?
Thanks for the help!
The Multi-AZ capability of Amazon RDS (Relational Database Service) is designed to offer high-availability for a database.
From Amazon RDS Multi-AZ Deployments:
When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure (for example, instance hardware failure, storage failure, or network disruption), Amazon RDS performs an automatic failover to the standby, so that you can resume database operations as soon as the failover is complete. Since the endpoint for your DB Instance remains the same after a failover, your application can resume database operation without the need for manual administrative intervention.
Multiple Availability Zones are recommended to improve availability of systems. Each AZ is a separate physical facility such that any disaster that should befall one AZ should not impact another AZ. This is normally considered sufficient redundancy rather than having to run across multiple Regions. It also has the benefit that data can be synchronously replicated between AZs due to low-latency connections, while this might not be possible between Regions since they are located further apart.
One final benefit... The Multi-AZ capability of Amazon RDS can be activated by simply selecting "Yes" when the database is launched. Running your own database and using mirroring services requires you to do considerably more work on an on-going basis.
I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture
We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)
Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.
I'm wondering if Amazon EC2+EBS can handle large Oracle databases (7TB to start with). Sounds like EBS can have storage volumes of up to 1TB and I could have many storage volumes attached to the same EC2 instance, but is it possible then to configure Oracle to use those storage volumes so that the database can grow to 7TB and beyond?
To pursue this I would bring in Oracle DBAs to assist, but I want to figure out if this is even a valid approach, or should we look elsewhere?
What other options are there for large (7-15 TB) databases in the cloud?
Yes, you can. But it can be painful. For instances this size you want: tape backup, fast storage, and, most importantly Automatic Storage Management(ASM).
When using ASM: You can run the oracle processes in a cloud, not the storage. Is not possible to really use ASM in the cloud, it uses specific hardware instructions to make storage fast, the VM would get in the way and make it too slow.
Running oracle without ASM for 5TB+ of data is not practical.
Important:
If you have 5TB of DATA, you need AT LEAST, 13TB of disk space to run a HA oracle instance.
In my company we run a 15TB Oracle in the cloud, but we hired dedicated storage devices. You can't do that with Amazon. (try mediatemple)