MAX_BATCH_ROWS not working for external function in snowflake - snowflake-cloud-data-platform

I have created an external function like below with MAX_BATCH_ROWS (Its a latest snowflake version)
create or replace external function my_ext_function(columnValue varchar, schemeName varchar, current_user varchar, current_role varchar, current_available_roles varchar)
returns variant
MAX_BATCH_ROWS = 100000
api_integration = [aws_api_integration]
HEADERS = ('accept','application/json')
CONTEXT_HEADERS = (current_user, current_role)
as '[aws_api_post_url]'
Later I have created an internal function like below to access the external function
create or replace function my_ext_function_internal(columnValue varchar, schemaName varchar)
returns variant
as $$ select my_ext_function(columnValue::string, schemaName, current_user, 'null', 'null') $$
Above works great except the MAX_BATCH_ROWS. The external function calls an api gateway in AWS which is tied to my lambda function. I have print statements in lambda to display rows coming from snowflake and its always somewhere between 1950 to 2050. Increasing or decreasing MAX_BATCH_ROWS does not makes any difference.
How can I make sure snowflake sends 100k rows in one go to my lambda function? How can i verify snowflake is sending rows prescribed in MAX_BATCH_ROWS. Highly appreciate any response.

Batch sizes are not guaranteed:
Because batch size and row order are not guaranteed, writing a function that returns a value for a row that depends upon any other row in this batch or previous batches can produce non-deterministic results.
Note also that because batch size is not guaranteed, counting batches is not meaningful.
https://docs.snowflake.com/en/sql-reference/external-functions-general.html
MAX_BATCH_ROWS acts as a hint, but the actual size of each batch can't be controlled.

Related

SQL Server - How do i get multiple rows of data into a returned variable

First question here so hoping that someone can help!
Im doing a lot of conversions of Access backends on to SQL server, keeping the front end in Access.
I have come across something that i need a little help with.
In Access, I have a query that is using a user-defined function in order to amalgamate some data from rows in a table into one variable. (By opening a recordset and enumerating through, adding to a variable each time.)
For example:
The query has a field that calls the function like this:
ProductNames: Product(ContractID)
And the VBA function "Product()" searches a table based on the ContractID. Cycles through each row it finds and concatenates the results of one field into one variable, ultimately returned to the query.
Obviously, moving this query to SQL server as a view means that that function will not be found as its in Access.
Can I use a function or stored procedure in order to do the same thing? (I have never used them before)
I must stress that I cannot create, alter or drop tables at run-time due to very strict production environment security.
If someone could give me an example id be really grateful.
So i need to be able to call it from the view as shown above.
Let say the table im looking at for the data is called tbl_Products and it has 2 columns:
| ContractID | Product |
How would that be done?! any help massively appreciated!
Andy
Yes you can most certainly do the same thing and adopt the same approach in SQL like you did in the past with VBA + SQL.
The easy solution would be to link to the view, and then build a local query that adds the additional column. However, often for reasons of performance and simply converting sql from Access to T-SQL, then I often “duplicate” those VBA functions as T-SQL functions.
The beauty of this approach is once you make this function, then this goes a “long” way towards easy converting some of your Access SQL to t-sql and views.
I had a GST calculation function in VBA that you would pass the amount, and a date (because the gst rate changes at a known date (in the past, or future).
So I used this function all over the place in my Access SQL.
When I had to convert to sql server, then I was able to use “views” and pass-though quires from Access and simply use “very” similar sql and include that sql function in the sql just like I did in Access.
You need to create what is called a SQL function. This function is often called a scaler function. This function works just like a function in VBA.
So in t-sql store procedure, or even as a expression in your SQL just like in Access!!!!
In your example, lets assume that you have some contract table, and you want to grab the “status” column (we assume text).
And there could be one, 1 or “several” or none!.
So we will concatenate each of the child records “status” code based on contract id.
You can thus fire up SSMS and in the database simply expand your database in the tree view. Now expand “programmability”. Now expand functions. You see “scaler-valued functions”. These functions are just like VBA functions. Once created, you can use the function in code (t-sql) or in views etc.
At this point, you can now write t-sql code in place of VBA code.
And really, you don’t have to “expand” the tree above – but it will allow you to “find” and “see” and “change” your functions you create. Once created then ANY sql, or code for that database can use the function as a expression just like you did in Access.
This code should do the trick:
CREATE FUNCTION [dbo].[ContractStatus]
(#ContractID int)
RETURNS varchar(255)
AS
BEGIN
-- Declare a cursor (recordset)
DECLARE #tmpStatus varchar(25)
DECLARE #MyResult varchar(255)
set #MyResult = ''
DECLARE rst CURSOR
FOR select Status from tblContracts where ID = #ContractID
OPEN rst
FETCH NEXT FROM rst INTO #tmpStatus
WHILE ##FETCH_STATUS = 0
BEGIN
IF #MyResult <> ''
SET #MyResult = #MyResult + ','
SET #MyResult = #MyResult + #tmpStatus
FETCH NEXT FROM rst INTO #tmpStatus
END
-- Return the result of the function
RETURN #MyResult
END
Now, in sql, you can go:
Select ProjectName, ID, dbo.ProjectStatus([ID]) as MyStatus from tblProjects.

Function returning a results set and an integer in SQL Server

I'm facing a quite annoying barrier enforced by SQL Server and would like to check if there is an elegant solution for this.
I have a sequence of procedures' invocations (meaning, A calls B which calls C). The procedures are due to return different results sets, where (for instance) "A" generates its result using a set of records returned by "B".
Now, SQL Server does not allow to have nested INSERT INTO ... EXEC <stored procedure> so, to cope with this limitation, I converted the lowest procedure into a function that returns a table and hence INSERT INTO ... SELECT * FROM <function call>.
Now, there are situations in which the FUNCTION cannot return a result due to conditions of the data, and I would like the function to return a sort of code indicating the result of the execution (e.g. 0 would mean success, 1 would mean "missing input data").
Since SQL Server does not allow functions with OUTPUT parameters, I can't think of any elegant way of conveying these two outputs.
Can anyone suggest an elegant alternative?
there are situations in which the FUNCTION cannot return a result due
to conditions of the data, and I would like the function to return a
sort of code indicating the result of the execution
You really should use THROW to indicate the result of execution, which also precludes using a table-valued function.
So you need to use a stored procedure. To avoid the restriction on nested INSERT .. SELECT you can use temporary tables to pass data back to the calling procedure. EG
create or alter procedure foo
as
begin
if object_id('tempdb..#foo_results') is null
begin
print 'create table #foo_results(id int primary key, a int);';
THROW 51000, 'The results table #foo_results does not exist. Before calling this procedure create it. ', 1;
end
insert into #foo_results(id,a)
values (1,1);
end;
Can anyone suggest an ELEGANT alternative?
I'm not sure any of the alternatives is elegant.

When we use user defined functions in SQL to return a table, why don't we use begin and end?

When we use user defined functions in SQL to return a table, why don't we use BEGINand END?
e.g.
CREATE FUNCTION Customers
(#minId int)
RETURNS TABLE
AS
RETURN(SELECT *
FROM TrackingItem ti
WHERE ti.Id >= #minId)
works
but
CREATE FUNCTION Customers
(#minId int)
RETURNS TABLE
AS
BEGIN
RETURN(SELECT *
FROM TrackingItem ti
WHERE ti.Id >= #minId)
END
doesn't work
You did not tag the DBMS, but from your syntax I guess this is SQL Server...
Good approach: inline TVF
The inline syntax (without the BEGIN...END) works like a VIEW with parameters. On usage it will be fully inlined by the query optimizer - just as if the code was written in the place where the function is used (almost). This means: Full usage of indexes, statistics, chached results...
But - which is not possible in all cases (but in most cases) - you must be able to write your full logic in one single statement.
Bad approach: multi-statement TVF
The second example needs a table definition which fits to the result you want to return. Your code has to use an INSERT against this virtual table and then return it. This is missing in your example.
This approach is absolutely to be avoided if ever possible. The query optimizer will not be able to look into this code in order to predict the result. It will not be able to use interimistic / cached results or indexes / statistics in the way an inlined query would do it.
The first one, without the BEGIN and END is called an "Inline Table Valued Function". You can only have the single statement inside the function body to immediately return the resultset.
The second one with the BEGIN and END is called a "Multi Statement Table Valued Function". You can have multiple statements inside the function body and then at the bottom have a RETURN statement to return the resultset. This allows you to e.g. populate a TABLE variable and then return it.
An Inline Table Valued Function can be thought of like a view - in it gets expanded out into the calling query, statistics on the underlying tables can be used to give a better execution plan.
Multi Statement Valued Functions do not expand out like this, and don't benefit from the same benefits when an execution plan is created.
So, it's best practise to avoid multi statement table valued functions, and prefer inline instead, in order to reap these benefits and avoid potential performance issues.
You can use in this ways, you are returning table type information and you have to define table with columns and return data through the table: Try following suppose you have 2 columns id and name:
CREATE FUNCTION Customers(#minId INT)
RETURNS ##table TABLE (id INT, NAME VARCHAR(20))
AS
BEGIN
INSERT INTO ##table
SELECT id, name
FROM TrackingItem ti
WHERE ti.Id >= #minId
RETURN
END

How to optimize replace and find function?

I am trying to create function which can replace certain words with hyperlink in sql. When I call the function as query in sql, its takes a really long time to execute the query, more than 2-3min. I assumed this is because, the tag_libary table has around 600,000 records and iterating through large number, would consume a lot of processing time.
CREATE FUNCTION dbo.ReplaceTags(#body VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
SELECT #body = REPLACE(#body,name,''+name+'')
FROM Tag_Library
RETURN #body
END
article table (id, title, body)
1, Story1, At the same time there is a list consisting of: DUCHS, EUROC, GLSPE and WODST. Only two of the tags have covered with the prices in the last three months - GROSV at 99.11 on 8 October and JUBIL at 0s on 11 September.
tag_library table (id, name)
1,DRYDN33
2,DUCHS
3,DRYDN33
4,DRYDN15
5,EUROC
6,DRYDN15
7,GROSV
Hence, I am writing to seek some advice, if there is a way to make this sql function optimal or would it be better to change this function, into a insert trigger?
Please advice, if possible.
Yust a thought, I did not test it:
Change your query to this one:
SELECT
#body = REPLACE(#body,name,''+name+'')
FROM
Tag_Library
WHERE
#body LIKE '%' + name + '%'
This should filter the Tag_Library table to those records which are present in the input string and the SQL Server do not have to process lots of unnecessary records (replaces). BUT It will not prevent to do a full table / index scan to check the table!
You can improve this solution by storing the required tags in a table per articles (and update that table via triggers when the source records/tables are changed). In this case you can use joins to filter the Tag_Library table (instead of the LIKE operator), but it reqires extra codes to maintain the dictionary.
You're focusing on the wrong thing. The problem is that this is a scalar function, and they perform miserably. You should change it to a table-valued function that returns a single row and use APPLY.
See, for example:
http://dataeducation.com/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post/

Mongodb ObjectId generator as SQL Server proc

I have a hybrid application where part of data (mostly legacy) is stored in SQL Server and another part in Mongodb. I just converted all primary key types in SQL Server to use ObjectId which I generate in the application when inserting new records into SQL Server.
Now, I found that I need to clone some template records (about 10-20 records at a time), and in order to do that I need to be able to generate ObjectId values via a SQL Server function or stored proc.
Is it possible and is there code available?
This question is old, but I was trying to do the same thing. This is what I came up with on SQL Server 2012.
Create Function NewObjectId(#counter binary(3))
returns binary(12)
begin
declare #epoch datetime2, #seconds binary(4), #process binary(2), #hostname binary(3)
set #epoch = '1/1/1970'
select #seconds = cast(Datediff(ss, #epoch, getutcdate()) as binary(4))
select #hostname = cast(HashBytes('MD5', HOST_NAME()) as binary(3))
select #process = cast(##SPID as binary(2))
declare #objectId binary(12)
select #objectId = (#seconds + #hostname + #process + #counter)
return #objectId
end
This can be called like this:
select NewObjectId(CRYPT_GEN_RANDOM(3))
The reason CRYPT_GEN_RANDOM(3) is passed in is because calling that function is apparently side-effecting and it can't be used inside another function. I would have preferred to use an incrementing sequence for the counter portion, but a random number works as well.
Also, I noticed that you said you are using a char(24) to store the value. This returns a binary(12) since that's what the MongoDB ObjectIds are. Using binary(12) also requires half of the space to store the value.
I'm sure this isn't helpful now, but it was a fun problem to solve.
I'm going to try taking my ObjectID C# code and see if I can load it as a CLR function into SQL Server. That might give better results and performance.
I think, you can use NEWID function which generates 16-byte uniqueidentifier.
But in MongoDB The BSON ObjectId Datatype is a 12-byte binary value.
Try this
SELECT LEFT(REPLACE(CAST(NEWID() as varchar(36)),'-',''),24)
Hope, this helps.
EDITED
In article Object IDs described BSON ObjectID specification. The format includes:
TimeStamp. This is a unix style timestamp. It is a signed int representing the number of seconds before or after January 1st 1970
(UTC).
Machine. This is the first three bytes of the (md5) hash of the machine host name, or of the mac/network address, or the virtual
machine id.
Pid. This is 2 bytes of the process id (or thread id) of the process generating the object id.
Increment. This is an ever incrementing value, or a random number if a counter can't be used in the language/runtime.
The server itself and almost all drivers use the format above.
So, it is impossible to generate MongoDB ObjectID in SQL Server.
The only way to solve this problem is to change logic of the application.

Resources