Snowflake Warehouse: Can the MAX WH size be capped? - snowflake-cloud-data-platform

Is it possible to specify the maximum size of a warehouse?
We want to be able to give users the flexibility to size up their warehouse, within limits. For our environment it's unlikely query performance will improve much when going from a MEDIUM to XX-Large warehouse. Yet our users tend to error on the high side, not the low one, resulting in high compute costs.
I know we can limit the number of clusters in a warehouse and we can implement a credit based quota on a warehouse (or account) using a resource monitor, but I'd really like to start with limiting the max size a WH can be set to in the first place.
I'm thinking it could be one more parameter of the WH, such as MAX_SIZE = 'M'.
Thanks,

Assuming there is a role (e.g., SYSADMIN) that creates all warehouses or has modify privileges. Create this procedure with that role and then any user can call this SP passing the warehouse name and desired size. Since the SP executes as owner (vs. caller), the user calling the SP doesn't need to have MODIFY privileges on the warehouse, just their role must have usage rights on the SP.
CREATE OR REPLACE PROCEDURE utl.arch_set_wh_size_sp(P_WH_NM VARCHAR, P_WH_SIZE VARCHAR)
/****************************************************************************************\
DESC: set WH size
YY-MM-DD WHO CHANGE DESCRIPTION
-------- ------------ ----------------------------------------------------------------
19-12-13 eroesch Initial design
\****************************************************************************************/
RETURNS STRING
LANGUAGE JAVASCRIPT
AS $$
var result = "";
var sqlCmd = "";
var sqlStmt = "";
var rs = "";
var curSize = "";
var whSizesAllowed = ["X-SMALL", "XSMALL", "SMALL", "MEDIUM"];
try {
// first validate the warehouse exists and get the current size
sqlCmd = "SHOW WAREHOUSES LIKE '" + P_WH_NM + "'";
sqlStmt = snowflake.createStatement( {sqlText: sqlCmd} );
rs = sqlStmt.execute();
if (sqlStmt.getRowCount() == 0) {
throw new Error('No Warehouse Found by that name');
} else {
rs.next();
curSize = rs.getColumnValue('size').toUpperCase();
}
// next validate the new size is in the acceptable range
if (whSizesAllowed.indexOf(P_WH_SIZE.toUpperCase()) == -1) {
throw new Error('Not a valid Warehouse size');
};
// set Warehouse size
sqlCmd = "ALTER WAREHOUSE " + P_WH_NM + " SET WAREHOUSE_SIZE = :1";
sqlStmt = snowflake.createStatement( {sqlText: sqlCmd, binds: [P_WH_SIZE]} );
sqlStmt.execute();
result = "Resized Warehouse " + P_WH_NM + " from: " + curSize + " to: " + P_WH_SIZE.toUpperCase();
}
catch (err) {
if (err.code === undefined) {
result = err.message
} else {
result = "Failed: Code: " + err.code + " | State: " + err.state;
result += "\n Message: " + err.message;
result += "\nStack Trace:\n" + err.stackTraceTxt;
result += "\nParam:\n" + P_WH_NM + ", " + P_WH_SIZE;
}
}
return result;
$$;

The short answer is No, you cannot put a cap on the size of a warehouse.
A longer answer is to build a process for resetting your warehouse sizes to whatever you deem as their "default" size on a given interval.

While not a perfect solution, you can ask your SE to set the parameter WAREHOUSE_MAX_SIZE to limit the size of warehouses that users in your account can provision.
On request, I have capped the size of warehouses for at least one of my customers using this method and it works just fine.

Related

Assign result from stored procedure to a variable

I have created a stored procedure that returns a create table sql statement; I want to be able to now call that procedure and assign the result to a variable like:
set create_table_statement = call sp_create_stage_table(target_db, table_name);
snowflake will not let me do this, so is there a way I can.
Context
We have just been handed over our new MDP which is built on AWS-S3, DBT & Snowflake, next week we go into production but we have 200+ tables and snowlpipes to code out. I wanted to semi automate this by generating the create table statements based off the tables metadata and then calling the results from that to create the tables. At the moment we're having to run the SQL, copy+paste the results in and then run that, which is fine in dev/pre-production mode when it's a handful of tables. but with just 2 of us it will be a lot of work to get all those tables and pipes created.
so I've found a work around, by creating a second procedure and calling the first one as a se=ql string to get the results as a string - then calling that string as a sql statement. like:
create or replace procedure sp_create_stage_table("db_name" string, "table_name" string)
returns string
language javascript
as
$$
var sql_string = "call sp_get_create_table_statement('" + db_name + "','" + table_name + "');";
var get_sql_query = snowflake.createStatement({sqlText: sql_string});
var get_result_set = get_sql_query.execute();
get_result_set.next();
var get_query_value = get_result_set.getColumnValue(1);
sql_string = get_query_value.toString();
try {
var main_sql_query = snowflake.createStatement({sqlText: sql_string});
main_sql_query.execute();
return "Stage Table " + table_name + " Successfully created in " + db_name + " database."
}
catch (err){
return "an error occured! \n error_code: " + err.code + "\n error_state: " + err.state + "\n error_message: " + err.message;
}
$$;
It is possible to assign scalar result of stored procedure to session variable. Instead:
SET var = CALL sp();
The pattern is:
SET var = (SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())));
Sample:
CREATE OR REPLACE PROCEDURE TEST()
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
RETURN 'Result from stored procedrue';
END;
CALL TEST();
SET variable = (SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())));
SELECT $variable;
-- Result from stored procedrue

Materialized View how to set a refresh time with task?

I am trying to set a task to refresh a materialized view every hour. I have tried this:
I ran and INSERT of new data to the original table. The Materialized View updated instantly
Forcing the table to drop, the undrop the table that makes up the materialized view. It resulted in a full restoration at the specific time - though this would get expensive quickly
Drop table BookInventory;
Undrop table BookInventory;
I could not find anything in documentation on scheduling a creation of a materilaized view. Has anyone done this before?
Another alternative is that you simply create your own "materialized view" via a custom procedure that you can schedule via a task.
the procedure creates a temp table like the current, including grants. Then inserts the data into this table from a view. Finally swap the tables and drop the temp table. Best to create this as a Transient table since there's no need for Time Travel.
CREATE OR REPLACE PROCEDURE utl.arch_create_mview_sp(P_TABLE_NM VARCHAR, P_VIEW_NM VARCHAR)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS $$
var result = "";
var sqlCmd = "";
var rs = "";
var tmpTableNM = P_TABLE_NM + "_tmp";
try {
sqlCmd = "CREATE OR REPLACE TABLE " + tmpTableNM + " LIKE " + P_TABLE_NM + " COPY GRANTS";
snowflake.execute( {sqlText: sqlCmd} );
sqlCmd = "INSERT INTO " + tmpTableNM + " SELECT * FROM " + P_VIEW_NM;
rs = snowflake.execute( {sqlText: sqlCmd} );
rs.next();
result = "rows inserted: " + rs.getColumnValue(1);
sqlCmd = "ALTER TABLE " + P_TABLE_NM + " SWAP WITH " + tmpTableNM;
snowflake.execute( {sqlText: sqlCmd} );
sqlCmd = "DROP TABLE " + tmpTableNM;
snowflake.execute( {sqlText: sqlCmd} );
}
catch (err) {
result = "Failed: Code: " + err.code + " | State: " + err.state;
result += "\n Message: " + err.message;
result += "\nStack Trace:\n" + err.stackTraceTxt;
}
}
return result;
$$;
You can suspend and resume materialized views. But you cannot query a suspended MV. What are you trying to accomplish? You are not going to save money, only defer the cost.

Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object

I inherited a project and I'm running into a SQL error that I'm not sure how to fix.
On an eCommerce site, the code is inserting order shipping info into another database table.
Here's the code that is inserting the info into the table:
string sql = "INSERT INTO AC_Shipping_Addresses
(pk_OrderID, FullName, Company, Address1, Address2, City, Province, PostalCode, CountryCode, Phone, Email, ShipMethod, Charge_Freight, Charge_Subtotal)
VALUES (" + _Order.OrderNumber;
sql += ", '" + _Order.Shipments[0].ShipToFullName.Replace("'", "''") + "'";
if (_Order.Shipments[0].ShipToCompany == "")
{
sql += ", '" + _Order.Shipments[0].ShipToFullName.Replace("'", "''") + "'";
}
else
{
sql += ", '" + _Order.Shipments[0].ShipToCompany.Replace("'", "''") + "'";
}
sql += ", '" + _Order.Shipments[0].Address.Address1.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.Address2.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.City.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.Province.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.PostalCode.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.Country.Name.Replace("'", "''") + "'";
sql += ", '" + _Order.Shipments[0].Address.Phone.Replace("'", "''") + "'";
if (_Order.Shipments[0].ShipToEmail == "")
{
sql += ",'" + _Order.BillToEmail.Replace("'", "''") + "'";
}
else
{
sql += ",'" + _Order.Shipments[0].ShipToEmail.Replace("'", "''") + "'";
}
sql += ", '" + _Order.Shipments[0].ShipMethod.Name.Replace("'", "''") + "'";
sql += ", " + shippingAmount;
sql += ", " + _Order.ProductSubtotal.ToString() + ")";
bll.dbUpdate(sql);
It is working correctly, but it is also outputting the following SQL error:
Violation of PRIMARY KEY constraint 'PK_AC_Shipping_Addresses'. Cannot insert
duplicate key in object 'dbo.AC_Shipping_Addresses'. The duplicate key value
is (165863).
From reading similar questions, it seems that I should declare the ID in the statement.
Is that correct? How would I adjust the code to fix this issue?
I was getting the same error on a restored database when I tried to insert a new record using the EntityFramework. It turned out that the Indentity/Seed was screwing things up.
Using a reseed command fixed it.
DBCC CHECKIDENT ('[Prices]', RESEED, 4747030);GO
I'm pretty sure pk_OrderID is the PK of AC_Shipping_Addresses
And you are trying to insert a duplicate via the _Order.OrderNumber?
Do a
select * from AC_Shipping_Addresses where pk_OrderID = 165863;
or select count(*) ....
Pretty sure you will get a row returned.
It is telling you that you are already using pk_OrderID = 165863 and cannot have another row with that value.
if you want to not insert if there is a row
insert into table (pk, value)
select 11 as pk, 'val' as value
where not exists (select 1 from table where pk = 11)
What is the value you're passing to the primary key (presumably "pk_OrderID")? You can set it up to auto increment, and then there should never be a problem with duplicating the value - the DB will take care of that. If you need to specify a value yourself, you'll need to write code to determine what the max value for that field is, and then increment that.
If you have a column named "ID" or such that is not shown in the query, that's fine as long as it is set up to autoincrement - but it's probably not, or you shouldn't get that err msg. Also, you would be better off writing an easier-on-the-eye query and using params. As the lad of nine years hence inferred, you're leaving your database open to SQL injection attacks if you simply plop in user-entered values. For example, you could have a method like this:
internal static int GetItemIDForUnitAndItemCode(string qry, string unit, string itemCode)
{
int itemId;
using (SqlConnection sqlConn = new SqlConnection(ReportRunnerConstsAndUtils.CPSConnStr))
{
using (SqlCommand cmd = new SqlCommand(qry, sqlConn))
{
cmd.CommandType = CommandType.Text;
cmd.Parameters.Add("#Unit", SqlDbType.VarChar, 25).Value = unit;
cmd.Parameters.Add("#ItemCode", SqlDbType.VarChar, 25).Value = itemCode;
sqlConn.Open();
itemId = Convert.ToInt32(cmd.ExecuteScalar());
}
}
return itemId;
}
...that is called like so:
int itemId = SQLDBHelper.GetItemIDForUnitAndItemCode(GetItemIDForUnitAndItemCodeQuery, _unit, itemCode);
You don't have to, but I store the query separately:
public static readonly String GetItemIDForUnitAndItemCodeQuery = "SELECT PoisonToe FROM Platypi WHERE Unit = #Unit AND ItemCode = #ItemCode";
You can verify that you're not about to insert an already-existing value by (pseudocode):
bool alreadyExists = IDAlreadyExists(query, value) > 0;
The query is something like "SELECT COUNT FROM TABLE WHERE BLA = #CANDIDATEIDVAL" and the value is the ID you're potentially about to insert:
if (alreadyExists) // keep inc'ing and checking until false, then use that id value
Justin wants to know if this will work:
string exists = "SELECT 1 from AC_Shipping_Addresses where pk_OrderID = " _Order.OrderNumber; if (exists > 0)...
What seems would work to me is:
string existsQuery = string.format("SELECT 1 from AC_Shipping_Addresses where pk_OrderID = {0}", _Order.OrderNumber);
// Or, better yet:
string existsQuery = "SELECT COUNT(*) from AC_Shipping_Addresses where pk_OrderID = #OrderNumber";
// Now run that query after applying a value to the OrderNumber query param (use code similar to that above); then, if the result is > 0, there is such a record.
To prevent inserting a record that exist already. I'd check if the ID value exists in the database. For the example of a Table created with an IDENTITY PRIMARY KEY:
CREATE TABLE [dbo].[Persons] (
ID INT IDENTITY(1,1) PRIMARY KEY,
LastName VARCHAR(40) NOT NULL,
FirstName VARCHAR(40)
);
When JANE DOE and JOE BROWN already exist in the database.
SET IDENTITY_INSERT [dbo].[Persons] OFF;
INSERT INTO [dbo].[Persons] (FirstName,LastName)
VALUES ('JANE','DOE');
INSERT INTO Persons (FirstName,LastName)
VALUES ('JOE','BROWN');
DATABASE OUTPUT of TABLE [dbo].[Persons] will be:
ID LastName FirstName
1 DOE Jane
2 BROWN JOE
I'd check if i should update an existing record or insert a new one. As the following JAVA example:
int NewID = 1;
boolean IdAlreadyExist = false;
// Using SQL database connection
// STEP 1: Set property
System.setProperty("java.net.preferIPv4Stack", "true");
// STEP 2: Register JDBC driver
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
// STEP 3: Open a connection
try (Connection conn1 = DriverManager.getConnection(DB_URL, USER,pwd) {
conn1.setAutoCommit(true);
String Select = "select * from Persons where ID = " + ID;
Statement st1 = conn1.createStatement();
ResultSet rs1 = st1.executeQuery(Select);
// iterate through the java resultset
while (rs1.next()) {
int ID = rs1.getInt("ID");
if (NewID==ID) {
IdAlreadyExist = true;
}
}
conn1.close();
} catch (SQLException e1) {
System.out.println(e1);
}
if (IdAlreadyExist==false) {
//Insert new record code here
} else {
//Update existing record code here
}
Not OP's answer but as this was the first question that popped up for me in google, Id also like to add that users searching for this might need to reseed their table, which was the case for me
DBCC CHECKIDENT(tablename)
There could be several things causing this and it somewhat depends on what you have set up in your database.
First, you could be using a PK in the table that is also an FK to another table making the relationship 1-1. IN this case you may need to do an update rather than an insert. If you really can have only one address record for an order this may be what is happening.
Next you could be using some sort of manual process to determine the id ahead of time. The trouble with those manual processes is that they can create race conditions where two records gab the same last id and increment it by one and then the second one can;t insert.
Third, you query as it is sent to the database may be creating two records. To determine if this is the case, Run Profiler to see exactly what SQL code you are sending and if ti is a select instead of a values clause, then run the select and see if you have due to the joins gotten some records to be duplicated. IN any even when you are creating code on the fly like this the first troubleshooting step is ALWAYS to run Profiler and see if what got sent was what you expected to be sent.
Make sure if your table doesn't already have rows whose Primary Key values are same as the the Primary Key Id in your Query.

slow SQLite read speed (100 records a second)

I have a large SQLite database (~134 GB) that has multiple tables each with 14 columns, about 330 million records, and 4 indexes. The only operation used on the database is "Select *" as I need all the columns(No inserts or updates). When I query the database, the response time is slow when the result set is big (takes 160 seconds for getting ~18,000 records).
I have improved the use of indexes multiple times and this is the fastest response time I got.
I am running the database as a back-end database for a web application on a server with 32 GB of RAM.
is there a way to use RAM (or anything else) to speed up the query process?
Here is the code that performs the query.
async.each(proteins,function(item, callback) {
`PI[item] = []; // Stores interaction proteins for all query proteins
PS[item] = []; // Stores scores for all interaction proteins
PIS[item] = []; // Stores interaction sites for all interaction proteins
var sites = {}; // a temporarily holder for interaction sites
var query_string = 'SELECT * FROM ' + organism + PIPE_output_table +
' WHERE ' + score_type + ' > ' + cutoff['range'] + ' AND (protein_A = "' + item + '" OR protein_B = "' + item '") ORDER BY PIPE_score DESC';
db.each(query_string, function (err, row) {
if (row.protein_A == item) {
PI[item].push(row.protein_B);
// add 1 to interaction sites to represent sites starting from 1 not from 0
sites['S1AS'] = row.site1_A_start + 1;
sites['S1AE'] = row.site1_A_end + 1;
sites['S1BS'] = row.site1_B_start + 1;
sites['S1BE'] = row.site1_B_end + 1;
sites['S2AS'] = row.site2_A_start + 1;
sites['S2AE'] = row.site2_A_end + 1;
sites['S2BS'] = row.site2_B_start + 1;
sites['S2BE'] = row.site2_B_end + 1;
sites['S3AS'] = row.site3_A_start + 1;
sites['S3AE'] = row.site3_A_end + 1;
sites['S3BS'] = row.site3_B_start + 1;
sites['S3BE'] = row.site3_B_end + 1;
PIS[item].push(sites);
sites = {};
}
}
The query you posted uses no variables.
It will always return the same thing: all the rows with a null score whose protein column is equal to its protein_a or protein_b column. You're then having to filter all those extra rows in Javascript, fetching a lot more rows than you need to.
Here's why...
If I'm understanding this query correctly, you have WHERE Score > [Score]. I've never encountered this syntax before, so I looked it up.
[keyword] A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility.
An identifier is something like a column or table name, not a variable.
This means that this...
SELECT * FROM [TABLE]
WHERE Score > [Score] AND
(protein_A = [Protein] OR protein_B = [Protein])
ORDER BY [Score] DESC;
Is the same as this...
SELECT * FROM `TABLE`
WHERE Score > Score AND
(protein_A = Protein OR protein_B = Protein)
ORDER BY Score DESC;
You never pass any variables to the query. It will always return the same thing.
This can be seen here when you run it.
db.each(query_string, function (err, row) {
Since you're checking that each protein is equal to itself (or something very like itself), you're likely fetching every row. And it's why you have to filter all the rows again. And that is one of the reasons why your query is so slow.
if (row.protein_A == item) {
BUT! WHERE Score > [Score] will never be true, a thing cannot be greater than itself except for null! Trinary logic is weird. So only if Score is null can that be true.
So you're returning all the rows whose score is null and the protein column is equal to protein_a or protein_b. This is a lot more rows than you need, I guess you have a lot of rows with null scores.
Your query should incorporate variables (I'm assuming you're using node-sqlite3) and pass in their values when you execute the query.
var query = " \
SELECT * FROM `TABLE` \
WHERE Score > $score AND \
(protein_A = $protein OR protein_B = $protein) \
ORDER BY Score DESC; \
";
var stmt = db.prepare(query);
stmt.each({$score: score, $protein: protein}, function (err, row) {
PI[item].push(row.protein_B);
...
});

How to count number of rows in SQL Server database?

I am trying to count the number of unread messages in my DB Table but is proving to be very difficult. I've even read tutorials online but to no avail.
What I'm doing should be simple.
Here's what I'm trying to do:
COUNT NUMBER OF ROWS IN NOTIFICATIONSTABLE
WHERE USERID = #0 AND MESSAGEWASREAD = FALSE
Can somebody please point me in the right direction? Any help will be appreciated.
Thank you
#helper RetrievePhotoWithName(int userid)
{
var database = Database.Open("SC");
var name = database.QuerySingle("select FirstName, LastName, ProfilePicture from UserProfile where UserId = #0", userid);
var notifications = database.Query("SELECT COUNT(*) as 'counter' FROM Notifications WHERE UserID = #0 AND [Read] = #1", userid, false);
var DisplayName = "";
if(notifications["counter"] < 1)
{
DisplayName = name["FirstName"] + " " + name["LastName"];
}
else
{
DisplayName = name["FirstName"] + ", you have " + notifications["counter"] + " new messages.";
}
<img src="#Href("~/Shared/Assets/Images/" + name["ProfilePicture"] + ".png")" id="MiniProfilePicture" /> #DisplayName
database.Close();
}
SELECT COUNT(*) FROM NotificationsTable WHERE
UserID = #UserID AND MessageWasRead = 0;
Sql Count Function
Okay so this is based on what I think should be done. I don't know the underlying types, so it is going to be my best guess.
var notifications = database.QuerySingle("Select COUNT(*) as NumRecs....");
if((int)notifications["NumRecs"] > 0)) .......
I changed the query for notifications to QuerySingle. You don't need a recordest, you only need a scalar value, so that should (hopefully remove your problem with the implicit conversion in the equals you were having.
I would also check to see if your database object implements IDisposable (place it in a using statement if so) as you are calling close, and this won't actually call close (I know it's not dispose but it might have dispose too) if you encounter and exception before the close function is called.
int unreadMessageCount = db.Query("SELECT * FROM Notification WHERE UserId=#0 AND Read=#1",UserId,false).Count();
string displayname = name["FirstName"] + " " + name["LastName"] + unreadMessageCount>0?",you have " + unreadMessageCount :"";

Resources