When running an Informatica IICS job into Snowflake, the batch warehouse is sized for the biggest jobs. Is there a way within Informatica to alter the warehouse size so that we can run with a smaller WH for most of the jobs but then scale it up when we run just the big jobs? This would allow us to better size the batch warehouse but then scale it automatically when needed.
Yes - just use a pre SQL command to resize the warehouse before the task/job runs
Related
Hi I am trying to understand how the snowflake warehouses are working
I have opened three worksheets and run the below commands inside each of them
use role testrole;
use warehouse testwh;
use database testdb;
use schema testschema;
select * from testtable; <-- has 30k rows
Now will it create three seperate warehouse instances of testwh and run inside each of them the above commands.
or will it create only one warehouseinstance of testwh and run the above command parallely
Are warehouse instances per worksheet basis or can multiple worksheets share same warehouse instance.
No, it will always use testwh.
And yes, all worksheets will use same warehouse unless you switch using use warehosue command.
Warehouse instances are independant of worksheet/parallel queries. Some warehouse can run upto 100s of parallel queries. So, if you open 101 sheets in parallel and running SQL same time, 101st query will wait in queue until one is freed. however you can Auto Scale using multi cluster warehouse where warehouse will scale up and down as per number of queries etc.
Warehouses can be created either using UI or using CREATE WAREHOUSE statement by admin.
I came across this question and was puzzled.
How to determine the size of virtual warehouse used for a task?
A. Root task may be executed concurrently (i.e multiple instances), it is recommended to leave some margins in the execution window to avoid missing instances of execution.
B. Querying (select) the size of the stream content would help determine the warehouse size. For ex, if querying large stream content, use a larger warehouse size.
C. If using stored procedure to execute multiple SQL statements, it's best to test run the stored procedure separately to size the compute resource first.
D. Since task infrastructure is based on running the task body on a schedule, it's recommended to configure the virtual warehouse for automatic concurrency handling using Multicluster warehouse to match the task schedule.
Check the new "serverless" Snowflake tasks:
https://www.snowflake.com/blog/taking-serverless-to-task/
In this case, Snowflake will automatically determine what's the best warehouse size.
You can give a hint to Snowflake on what size to start with, using USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE.
Specifies the size of the compute resources to provision for the first run of the task, before a task history is available for Snowflake to determine an ideal size. Once a task has successfully completed a few runs, Snowflake ignores this parameter setting. https://docs.snowflake.com/en/sql-reference/sql/create-task.html
The implications on billing are described here:
https://docs.snowflake.com/en/user-guide/admin-serverless-billing.html
We have a stored procedure in Azure SQL database (Pricing tier is premium with 250 DTU) which processes around 1.3 billion records and inserts results in tables which we display in reporting page. To run this stored procedure, it takes around 15 minutes and we have scheduled it weekly as Azure webjobs because we use same database for writing actual user logs.
But now, we want real time reporting max 5 minutes of differences and if I schedule webjobs to execute the stored procedure every 5 minutes then my application will shutdown.
Is there any other approach to achieve real time reporting?
Is there any Azure services available for it?
Can I use azure databricks to execute the stored procedure? Will it help?
Yes, you can use read queries on Premuim replica databases, by adding this to your connection string:
ApplicationIntent=ReadOnly;
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-read-scale-out
Now that SSD is a standard de-facto. Can the maintenance job of rebuilding indexes and updating statistics, using Ola Hallengren https://ola.hallengren.com/ jobs for instance, be done only on demand instead of regularly such as on weekly basis with Microsoft SQL Server 2017 or lower?
If it is still required, what would be the threshold of the database size to decide when it is necessary ?
I need to install a SQL server for prod environment. There are only two drives in the system one drive with 120 GB and another with 50 GB. How to choose the drives to keep the user defined db data and log files and temp db files.
Your question too broad to have simple answer.
Take this points into consideration:
What is the size of user data database?
What is expected growth of user data database?
Do you have a lot of queries with #-tables? (tempdb strain)
What is expected transaction count in second/minute?
Use SQLIO to measure speed of your drives.
Do you have ready load tests? (Use them and look in Resource Monitor, check disk queues)
Which recovery model you are using? (Growth of your log files)
Which backup strategy you are planning?
It is equally possible:
you don't have to worry with all DBs on default location
you need faster hardware