Does Snowflake warehouse base on virtual machine like EC2? - snowflake-cloud-data-platform

Is Snowflake warehouse based on virtual machine like EC2? I mean is every Snowflake warehouse a EC2? It so, it is impossible for warehouse to resume so fast. If it is not, is it based on container or something?
Thanks
Steven

The compute (Virtual Warehouse) layer of Snowflake uses whatever compute is available on the platform the account is set up on. One of the good things about Snowflake is that you don't have to concern yourself with these details, It's probably something like:
AWS -> EC2
GCP -> Compute Engine
Azure -> Virtual Machines
From what I understand, Snowflake maintain a pool of servers. When you shut down your Virtual Warehouse it doesn't mean that your EC2 instance terminate - they just go back into the pool of servers that can be used by other Snowflake customers.

At a high level a Snowflake DWH is composed of three layers
a.) Storage Layer: In AWS this would typically be S3 storage layer
b.) Compute Layer: These are EC2 machines and are used for the Compute and data load
c.) Web Services: These take care of services like authentication and security
So the compute layer of Snowflake is hosted on EC2.

Related

If I want to use Amazon DynamoDB as database for my mobile app, which Ec2 instance I can opt for?

I want to use dynamoDB as the database for my mobile application. If EC2 instance will perform well, if my mobile application has following:
100k daily active users
1 million daily active users
10 million active users
I am new to AWS ecosystem and i am unable to figure out which instance to choose.
DynamoDb is serverless service, which means that you do not need to provision any EC2 instances to host the database. Its all managed by AWS.
DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don't have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
The only thing that you have to consider is setting up its read/write capacity. However, if you are not sure on this as well, you can use on-demand capacity mode:
Amazon DynamoDB on-demand is a flexible billing option capable of serving thousands of requests per second without capacity planning. DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.

Whats the best redundancy setup on AWS for SQL Server 2014

We're migrating our environment over to AWS from a colo facility. As part of that we are upgrading our 2 SQL Server 2005s to 2014s. The two are currently mirrored and we'd like to keep it that way or find other ways to make the servers redundant. # of transactions/server-use is light for our app - but it's in production, requires high availability, and, as a result, requires some kind of fail over.
We have already setup one EC2 instance and put SQL server 2014 on it (as opposed to using RDBMS for licensing reasons and are now exploring what to do next to achieve this.
What suggestions do people have to achieve the redundancy we need?
I've seen two options thus far from here and googling around. I list them below - we're very open to other options!
First, use RDBMS mirroring service, but I can't tell if that only applies if the principal server is also RDBMS - it also doesn't help with licensing.
Second, use multiple availability zones. What are the pros/cons of this versus using different regions altogether (e.g., bandwidth issues) etc? And does multi-AZ actually give redundancy (if AWS goes down in Oregon, for example, then doesn't everything go down)?
Thanks for the help!
The Multi-AZ capability of Amazon RDS (Relational Database Service) is designed to offer high-availability for a database.
From Amazon RDS Multi-AZ Deployments:
When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure (for example, instance hardware failure, storage failure, or network disruption), Amazon RDS performs an automatic failover to the standby, so that you can resume database operations as soon as the failover is complete. Since the endpoint for your DB Instance remains the same after a failover, your application can resume database operation without the need for manual administrative intervention.
Multiple Availability Zones are recommended to improve availability of systems. Each AZ is a separate physical facility such that any disaster that should befall one AZ should not impact another AZ. This is normally considered sufficient redundancy rather than having to run across multiple Regions. It also has the benefit that data can be synchronously replicated between AZs due to low-latency connections, while this might not be possible between Regions since they are located further apart.
One final benefit... The Multi-AZ capability of Amazon RDS can be activated by simply selecting "Yes" when the database is launched. Running your own database and using mirroring services requires you to do considerably more work on an on-going basis.

Hosted Databases: How is latency handled?

I read some things about hosted (aka cloud) databases. For example Cloudant offers a hosted CouchDB database or Cassandra.io offers hosted Cassandra. I understand why these services solve some problems.
My question: Why do these services work? I suppose I host my own application on my own servers (or somewhere on a cloud-hosting-platform) and use one of these services to store my data. For every database request (either read or write), I need to pay a full roundtrip over the internet (supposing my application is not hosted in the same place as my database cloud provider uses). Why aren't these roundtrips killing me? When thinking about SQL, every query would cost another x*10ms just for the network, without any time spend.
How is this problem solved? Or are these services not suitable for applications which need fast responses and can only be used for data processing where latency is not an issue?
generally, the physical hosts of hosted database services normally reside in major data-centers (e.g. AWS). In order to reduce network latency, customers can choose whether to host their application on servers that reside in the physical same data-center as their hosted databases reside.
The majority of high-performance applications and/or website that do not use hosted database services usually maintain their application servers and their database servers on separate hosts for performance reason anyways. So, in short, switching to hosted database service would not necessarily increase network latency.

Cloud Architecture

I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture
We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)
Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.

Amazon AWS:: Which service should I use?

I am planning to host my iphone game on amazon aws. Basically my game just need a database, and currently I am using mysql (relational database) to store users data.
I am new to amazon aws, and I have read some of the articles. This page: http://aws.amazon.com/running_databases/ provides some available choices for databases.
RDS (relational database services)
EC2 with Relational Database AMI (it has mysql)
simpleDB
I think I will skip simpleDB, because I have read the sample codes, the database structure is kind of different from relational db, no join tables, all data stored in strings. The current game that I am developing is already in relational form, with all the php codes already, maybe for future project, I could consider it.
Now, left RDS and EC2, which one should I use? In comparison in costs, performances, reliability and stability? My game server requirements:
MySQL database (as I only familiar with this database engine and I already developed the game half way, no time to re-write or learn new language)
Easy to scale
Load balancing
Automatic backup
(if possible, less maintenance works in future)
Please give me some advice, thank you very much.
As you have already chosen MySQL on AWS, the question is only whether you want to host the Database Server on the Instance or through AWS RDS Service.
In comparison in costs, performances, reliability and stability and the your game server requirements:
MySQL database Easy to scale,
Load balancing,
Automatic backup,
(if possible, less maintenance works in future),
AWS RDS would be the BEST option.
As once you scale the Environment, it might be complex and needs lot of processing and maintaining if you host it on the INSTANCE.
While AWS RDS makes it easy for you.
Hope It Helps.. :)
If you need EC2 instance(s) anyway (for web hosting for instance) then hosting MySQL on an EC2 instance that you are already paying for is going to be cheaper...
But as your load goes up I would definitely look towards RDS for easier scaling, reduced admin overhead, better disaster recover story, etc... No reason in my opinion to host MySQL on dedicated EC2 instances...
For your requirement of load balancing and easy scaling you will need a dedicated instance for database. Your EC2 instances hosting your game would be behind load balancer and those will all connect to one database on dedicated instance.
That dedicated instance hosting your database could be RDS or EC2 instance. RDS is expensive but has its benefits.

Resources