Server specs of Virtual warehouse servers in Snowflake - snowflake-cloud-data-platform

I know that Snowflake takes away the headache of managing servers and sizing with its Virtual WH concept, but I wanted to know the physical specs of each individual server that Snow flake uses as part of its Virtual Warehouses or VH clusters. Can someone help?

There's no official documentation for the physical specs of each warehouse.
Part of Snowflake's goal is to keep each warehouse performance equivalent throughout the 3 supported clouds. The specific hardware used for each instance will keep changing as Snowflake works to optimize users' experience and the offers of each platform.
https://docs.snowflake.com/en/user-guide/warehouses-overview.html
In any case, this question seems to be a duplicate of What are the specifications of a Snowflake server?.

Related

In Snowflake which interfaces can be used to create and/or manage Virtual Warehouses?

A. The Snowflake Web Interface (UI)
B. SQL commands
C. Data integration tools
Do Data Integration tools used for creating Virtual Warehouse. "A" and "B" are correct but for "C" I am little bit confused.
Of course there are integration tools that support the management of Virtual Data Warehouses.
One example might be Pentaho, please see this article in Hitachi documentation: PDI and Snowflake
I think you might be studying for the SnowPro certification, and this is one of the "practice questions" ?
I think a little care might be needed with this one. Snowflake do put a few "trick" questions in these exams.
So, some data integration tools can create and manage Snowflake virtual warehouses. The answer by #MichaelGolos includes a link to PDI's documention:
In PDI, you can create, modify, and even delete a Snowflake virtual warehouse to help you automate your virtual warehouse scaling activities. These orchestration entries include:
Create Snowflake warehouse
You can use this job entry to create a Snowflake virtual warehouse. You can set size, scaling, automated suspension, and other properties for your warehouse.
Modify Snowflake warehouse
Once you create a warehouse, you can edit its settings using this job entry. Modifying a warehouse is useful if your users typically perform simple queries and only require a small warehouse. However, to meet your ETL service-level agreements (SLA), you may need a larger warehouse during the ETL process. Using this job entry, you can modify the warehouse at the beginning of the ETL process to scale it up, and then modify it to scale it back down when the ETL process is complete.
Delete Snowflake warehouse
Use this job entry to delete virtual warehouses. Deleting unwanted virtual warehouses helps you clean up the Snowflake management console.
HOWEVER, pay careful attention to the wording of this question:
In Snowflake which interfaces can be used to create and/or manage Virtual Warehouses?
where I have added bold emphasis.
The kind of orchestration that PDI and other tools provide is not In Snowflake - it is a 3rd party tool, and therefore is outside Snowflake. So, in my opinion the answer to this question is NO.
I’m not entirely sure what the actual question is that you are asking but…
Could a data integration tool manage warehouses? Yes
Do some data integration tools actually do this? Yes
Do all data integration tools do this? No

SQL Server setup Virtual or Physical

We are in the process of setting up new DB server but need some help on whether we should go Virtual hosting route or physical one. Here is the background info:
Database: SQL Server 2016 with SSRS
It will be centrally hosted with around max 200 concurrent users, system will be accessed by users a crossed globe. In future number of concurrent users may rise to 300 users.
Infra team has assured me that they will be setting up dedicated DB server but they want to host virtual server on it cause it is more beneficial from DR point of view.
Development team prefer to have physical server because it makes life easy when things goes wrong and needs investigation
I hope you can either provide me some guidance on it or point me in right direction on this dilemma.
Many thanks in advance.
As Steve Matthews noted you omitted all kinds of crucial details about the app. But in real life the use of Virtual Machines for production apps is very, very widespread both using VMWARE products and HyperV (Microsoft products). While you may lose 3-10% performance vs running directly on a machine there are many, many advantages to running on a virtual machine (Admins can allocate more memory, CPU and other resources easily if they are needed and available).
Another increasing popular approach is to virtualize using Azure (i.e. the 'cloud.') Here too you can add all kinds of resources as your app's needs changes but of course you will be charged for it. When you go with AZURE there are certain parts of the product you cannot run - see this http://searchsqlserver.techtarget.com/feature/Why-you-should-think-twice-about-Windows-Azure-SQL-Database
You may also want to see this from Microsoft:
https://azure.microsoft.com/en-us/documentation/articles/sql-database-general-limitations/
But much of the administration, including backup, can be done by Azure which makes it very attractive to many shops.
Good luck whichever way you go.

DB2 performance analysis tools for developers

As a developer using DB2 for the first time, I'm not familiar with what the best database performance analysis tools are for it.
I'm wondering what others have found useful in terms of tools that come with DB2, and any third-party tools available for it.
e.g. Anything better than others for things like query planning, CPU measurement, index usage, etc.?
You don't specify which version/release of DB2 you're running, or whether you're running the mainframe (z/OS) version or DB2 for Linux, UNIX, and Windows (also known as DB2 for LUW).
If you're running DB2 on z/OS, talk to your DBA and you'll find out exactly which monitoring and analysis tools have been licensed.
If it's DB2 for LUW you're using, there are various structures and routines you can access directly in DB2 to capture detailed performance information. IBM adds more of these features with each new DB2 release (e.g. version 9.5 vs. 9.7), so be sure to access the version of the documentation for your specific release. Here is the monitoring guide for 9.5 and here is the 9.7 monitoring guide.
The challenge will be to capture and analyze that performance data in some useful way. BMC, CA, DBI, IBM, and even HP have very good third-party tools to help you do that. Some of them are even free.
On the open-source side, monitors from GroundWork Open Source and Hyperic HQ Open Source have some DB2 support, but you'll need to spend some time configuring either of those environments to access your DB2 server.
Many of the tools mentioned above track some combination of DB2 health and performance indicators, and may even alert you when something about DB2 or its underlying server has entered a problem status. You will face choices over what to use as the criteria for triggering alerts, versus the KPIs you simply want to capture without ever alerting.
There are a lot of monitoring tools out there that can be taught how to watch DB2, but one of the most versatile and widely used is RRDtool, either on its own with a collection of custom DB2 scripts, or as part of a Cacti or Munin installation, which automates many (but not all) aspects of working with RRDtool. The goal of RRDtool is to capture any kind of numeric time-series data so it can be rendered into various graphs; it has no built-in alert capabilities. Implementing RRDTool involves choosing and describing the data points you intend to capture and allocating RRDtool data files to store them. I use it a lot to identify baseline performance and resource utilization trends for a database or an application. The PNG bitmaps it produces can be integrated into a wide variety of IT dashboards, provided those dashboards are customizable.

Virtualize the database server or the web server?

In a web application architecture with 1 app server (IIS) and 1 database server (MSSQL), if you had to pick one server to virtualize in a VM, which would it be: web or db?
Generally speaking of course.
Web of course.
Databases require too much IO bandwidth + It's easier to add instances or databases to a single instance, whereas isolated web servers benefit more.
Similar question... "Run Sharepoint 2003 on VMWare?". Sharepoint is just an asp.net application with a SQL Server back end.
The shortcoming of most virtual environments is I/O, especially disk. SQL is very I/O intensive. I am using a shared virtual host and the slow I/O is killing me.
That said, Microsoft is pushing SQL on Hyper-V. It's a hypervisor, which means its a thinner layer between the VM and the hardware, and the drivers are quasi-native. Here's their whitepaper: http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/SQL2008inHyperV2008.docx
Looks like for SQL, you will lose ~10% performance overall. The upside is that you can move the whole instance to another box quickly, bump up the RAM, etc.
Another thing to consider is Intel's enterprise SSD drives (X25-E). I imagine that would help a lot in a virtual environment. Pricey, of course.
I would probably decide depending on the amount of computation required by the app server, versus the amount of computation/io done by the database.
With that said I would think most of the time the DB should NOT be virtualized. Virtualization isn't too hot for db's that have to ensure that data remains nice and safe on a disk, and adding another abstraction layer can't help with that.
If you have two physical servers there is no need to virtualise - use one server for IIS and one for the database.
If you have one physical server there is also no need to virtualise.
If I had to choose, it would be the web server. The database would benefit in terms of performance by running on a physical server. If the web server is virtualised, it makes it quick and easy to clone it to create a cluster of web servers.
With today's hypervisors and best practices you can virtualise both infrastructures. When you virtualise your DB infrastructure it is best to ensure that the DB is installed onto a SAN based system so that IO performance is not a bottleneck.
As with everything there are the right and wrong way of doing things but following vendor best practices and testing will enable you to squeeze the best performance out of your VM instances.
There are plenty of whitepapers and performance testing from the various vendors should you want to virtualise your entire infrastructure.
Even though virtualisation again is an industry hot topic with various vendors now giving away hypervisors for free, this does not mean that using virtualisation is the way forward. Server consolidation yes, performance enhancing maybe - YMMV

SQL Server and Oracle, which one is better in terms of scalability? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
MS SQL Server and Oracle, which one is better in terms of scalability?
For example, if the data size reach 500 TB etc.
Both Oracle and SQL Server are shared-disk databases so they are constrained by disk bandwidth for queries that table scan over large volumes of data. Products such as Teradata, Netezza or DB/2 Parallel Edition are 'shared nothing' architectures where the database stores horizontal partitions on the individual nodes. This type of architecture gives the best parallel query performance as the local disks on each node are not constrained through a central bottleneck on a SAN.
Shared disk systems (such as Oracle Real Application Clusters or Clustered SQL Server installations still require a shared SAN, which has constrained bandwidth for streaming. On a VLDB this can seriously restrict the table-scanning performance that is possible to achieve. Most data warehouse queries run table or range scans across large blocks of data. If the query will hit more than a few percent of rows a single table scan is often the optimal query plan.
Multiple local direct-attach disk arrays on nodes gives more disk bandwidth.
Having said that I am aware of an Oracle DW shop (a major european telco) that has an oracle based data warehouse that loads 600 GB per day, so the shared disk architecture does not appear to impose unsurmountable limitations.
Between MS-SQL and Oracle there are some differences. IMHO Oracle has better VLDB support than SQL server for the following reasons:
Oracle has native support for bitmap indexes, which are an index structure suitable for high speed data warehouse queries. They essentially do a CPU for I/O tradeoff as they are run-length encoded and use relatively little space. On the other hand, Microsoft claim that Index Intersection is not appreciably slower.
Oracle has better table partitioning facilities than SQL Server. IIRC The table partitioning in SQL Server 2005 can only be done on a single column.
Oracle can be run on somewhat larger hardware than SQL Server, although one can run SQL server on some quite respectably large systems.
Oracle has more mature support for Materialized views and Query rewrite to optimise relational queries. SQL2005 does have some query rewrite capability but it is poorly documented and I haven't seen it used in a production system. However, Microsoft will suggest that you use Analysis Services, which does actually support shared nothing configurations.
Unless you have truly biblical data volumes and are choosing between Oracle and a shared nothing architecture such as Teradata you will probably see little practical difference between Oracle and SQL Server. Particularly since the introduction of SQL2005 the partitioning facilities in SQL Server are viewed as good enough and there are plenty of examples of multi-terabyte systems that have been successfully implemented on it.
When you are talking 500TB, that is (a) big and (b) specialized.
I'd be going to a consultancy firm with appropriate specialists to look at the existing skill sets, integration with existing technology stacks, expected usage, backup/recovery/DR requirements....
In short, it's not the sort of project I'd be heading into based on opinions from stackoverflow. No offence intended, but there's simply too many factors to take into account, a lot of which would be business confidential.
Whether Oracle or MSSQL will scale / perform better is question #15. The data model is the first make-it or break-it item regardless of if you're running Oracle, MSSQL, Informix or anything else. Data model structure, what kind of applicaiton, how it accesses the db etc, which platform your developers know well enough to target for a large system etc are the first questions you should ask yourself.
I've worked as a DBA on Oracle (although some years back) and I use MSSQL extensively now, although not as a formal DBA. My advice would be that in the vast majority of cases both will meet everything you can throw at them and your performance issues will be much more dependent upon database design and deployment than the underlying characteristics of the products, which in both cases are absolutely and utterly solid (MSSQL is the best product that MS makes in many peoples opinion so don't let the usual perception of MS blind you on that).
Myself I would tend towards MSSQL unless your system is going to be very large and truly enterprise level (massive numbers of users, multiple 9's uptime etc.) simply because in my experience Oracle tends to require a higher level of DBA knowledge and maintenance than MSSQL to get the best out of it. Oracle also tends to be more expensive, both for initial deployment and in the cost to hire DBAs for it. OTOH if you are looking at an enterprise system then Oracle would have the edge, not least because if you can afford it their support is second to none.
I have to agree with those who said deisgn was more important.
I've worked with superfast and super slow databases of many different flavors (the absolute worst being an Oracle database, but it wasn't Oracle's fault). Design of the database and how you decide to index it and partition it and query it have far more to do with the scalability than whether the product is from MSSQL Server or Oracle.
I think you may more easily find more Oracle dbas with terrabyte database experience (running a large database is a specialty just like knowing a particular flavor of SQL) but that could depend on your local area.
oracle people will tell you oracle is better, sql server peopele will tell you sql server is better.
i say they scale pretty much the same. use what you know better. you have databases out there that are that size on oracle as well as sql server
When you get to OBSCENE database sizes (where over 1TB is really big enough, and 500TB is frigging massive), then operational support must come very high up on the list of requirements. With that much data, you don't mess about with penny pinching system specifications.
How are you going to backup that size of system? Upgrade the OS and patch the database? Scalability and reliability a concern?
I have experience of both Oracle and MS SQL, and for the really really big systems (users, data or importance) then Oracle is better designed for operational support and data management.
Every tried to backup and restore a 1TB+ SQL Server database split over multiple databases on multiple instances with transaction log files being spat out everywhere by each database and trying to keep it all in sync? Good luck with that.
With Oracle, you have ONE database (so I disagree with the "shared nothing" approach is better) with ONE set of REDO logs(1) and one set of archive logs(2) and you can just add extra hardware nodes without changing (i.e. repartitioning) you application and data.
(1) Redo logs are, of course, mirrored.
(2) Archive logs are, of course, stored in multiple locations.
It would also depend on what is your application meant for. If it uses only Inserts with very few updates, then I think MSSQL would be more scalable and better in terms of performance. However if one has lots of updates, then Oracle would scaleup better
I very much doubt that you are going to get an objective answer to that particular question, until you come across anyone that has implemented the same database (schema, data, etc.) on both platforms.
However given the fact that you can find millions of happy users of both databases, I dare say it's not too much of a stretch to say either will scale just fine (I've seen a snappy Sql 2005 implementation of 300 TB that seemed pretty responsive)
Oracle like a high-quality manual film camera, which needs the best photographer to take the best picture while MS SQL like an automatic digital camera. In old days, of course, all professional photographers will use film camera, now think about how many professional photographers use automatic digital camera.

Resources