What are the specifications of a Snowflake server?

What are the specifications of a Snowflake server? - snowflake-cloud-data-platform

When it comes to cluster sizes, the documentation states that, for example an XS cluster consists on 1 server. But I have been looking for the server specifications and I can't seem to find any documentation on this topic.

For AWS snowflake, if you open the browser console in Chrome (F12) and run this SQL in the normal snowflake window:
SELECT 1/0;
you will get an error, the network responce for that message has too much information, but one part states the server EC2 instance type as
"warehouseServerType" : "c5d.2xlarge",
which was an x-small

There are no apparent publicly available specs for the WAREHOUSE building blocks, other that on AWS, the compute nodes are EC2 servers. Most likely an 8-thread CPU, 30-40 GB RAM (higher on Azure), and maybe around 250GB SSD. Something like m5d.2xlarge, but I'm only guessing.
Importantly, those figures stack up when increasing the WAREHOUSE size, making most storage bound queries running faster.

Related

Server Slowness

We created an ASP.NET MVC web application using C#, and we're using SQL Server database and hosted on AWS server.
The issue is when we increase the load on the web application, then the whole application goes down. And when check monitor graph on AWS then find out that CPU utilization is very high like 70 to 80%. Normally its like 20 to 25%. Also there is strange issue that according to UTC date at 5AM its again started working properly.
I tried to optimize my stored procedures and increase AWS server load capacity. Checked monitor activity in SQL Server. But did not found why everyday getting down at load time. What might be the reason behind them?
Here are some related please review once
CPU Utilization Graph.
Database Activity Monitor
Please let me know how to find out the cause of these issues.

Providing basic insights here : The three most common causes of server slowdown: CPU, RAM, and disk I/O. CPU use can slow down the host as a whole and make it hard to complete tasks on time. Top and sar are two tools you can check the CPU.
As you are using stored proedure , if it might take more time in querying then this will results in IO Waits which utilmately results down in server slowdown. This is a basic troubleshooting doc .

Server specs of Virtual warehouse servers in Snowflake

I know that Snowflake takes away the headache of managing servers and sizing with its Virtual WH concept, but I wanted to know the physical specs of each individual server that Snow flake uses as part of its Virtual Warehouses or VH clusters. Can someone help?

There's no official documentation for the physical specs of each warehouse.
Part of Snowflake's goal is to keep each warehouse performance equivalent throughout the 3 supported clouds. The specific hardware used for each instance will keep changing as Snowflake works to optimize users' experience and the offers of each platform.
https://docs.snowflake.com/en/user-guide/warehouses-overview.html
In any case, this question seems to be a duplicate of What are the specifications of a Snowflake server?.

ec2 as bastion host and its performance

I am having a Redshift in private subnet and an ec2 instance in Public subnet which is just a bastion host for my Redshift. All is working well and I can actually connect to my Redshift through internet (SSH).
Now, I want redo the task in the production environment and I want to choose an ec2 instance (nano, micro etc). I am having a doubt, whether my ec2 instance performance depends on the query data transfer size. That is, lets say my redshift returns a huge amount of data for a query and will ec2 throttle the performance?
Basically, I don't want my ec2 to be a performance bottleneck and I am not sure will it be. Any thoughts?
Thanks in advance!

Firstly, you can change the Instance Type of an Amazon EC2 instance at any time. Just stop the instance, change the Instance Type and start it again. So, start with t2.nano and make it bigger if you find any performance problems.
Secondly, your use-case will consume very little RAM and very little CPU. You can look at Amazon CloudWatch metrics to monitor CPU utilization and you can use operating system tools to monitor memory (or use Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances).
Bottom line: Measure and monitor your existing environment and the production environment. Change Instance Type as necessary. Don't sweat it.

Microsoft Access database - queries run on server or client?

I have a Microsoft Access .accdb database on a company server. If someone opens the database over the network, and runs a query, where does the query run? Does it:
run on the server (as it should, and as I thought it did), and only the results are passed over to the client through the slow network connection
or run on the client, which means the full 1.5 GB database is loaded over the network to the client's machine, where the query runs, and produces the result
If it is the latter (which would be truly horrible and baffling), is there a way around this? The weak link is always the network, can I have queries run at the server somehow?
(Reason for asking is the database is unbelievably slow when used over network.)

The query is processed on the client, but that does not mean that the entire 1.5 GB database needs to be pulled over the network before a particular query can be processed. Even a given table will not necessarily be retrieved in its entirety if the query can use indexes to determine the relevant rows in that table.
For more information, see the answers to the related questions:
ODBC access over network to *.mdb
C# program querying an Access database in a network folder takes longer than querying a local copy

It is the latter, the 1.5 GB database is loaded over the network
The "server" in your case is a server only in the sense that it serves the file, it is not a database engine.
You're in a bad spot:
The good thing about access is that it's easy to create forms and reports and things by people who are not developers. The bad is everything else about it. Particularly 2 things:
People wind up using it for small projects that grow and grow and grow, and wind up in your shoes.
It sucks for multiple users, and it really sucks over a network when it gets big
I always convert them to a web-based app with SQL server or something, but I'm a developer. That costs money to do, but that's what happens when you use a tool that does not scale.

I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture

We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)

Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight