Performance issues while passing UDTT[] to postgres Function - sql-server

I have created a function in postgres that takes a UDTT[] as an importing parameter, and want to eventually insert that data into a Table
Example Udtt
create type udtt_mytype as
(
id uuid,
payload int
);
And then an example Function is something akin to
CREATE OR REPLACE FUNCTION dbo.p_dothething(p_import udtt_mytype[])
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
insert into mytab select * from unnest($1)
RETURN;
END
$function$;
My C# backend presently looks like
public class udtt_mytype
{
[PgName("id")]
public Guid id{ get; set; }
[PgName("payload ")]
public int payload { get; set; }
}
var payload = CreateAndFillUdttMyType();
var conn = new NpgsqlConnection();
conn.Open();
var transaction = conn.BeginTransaction();
conn.MapComposite<udtt_mytype>("udtt_mytype");
var command = new NpgsqlCommand("dbo.p_dothething", conn);
command.CommandType = CommandType.StoredProcedure;
Object[] objArray = new Object[1];
objArray[0] = new NpgsqlParameter { ParameterName = "p_import",
Value = payload , NpgsqlDbType = NpgsqlTypes.NpgsqlDbType.Array |
NpgsqlTypes.NpgsqlDbType.Composite };
command.Parameters.AddRange(objArray);
var result = command.ExecuteScalar();
transaction.Commit();
conn.Close();
While the above works, it is pretty non-performant compared to a similiar UDTT -> SQL StoredProcedure. Prior to our NPGSQL implementation, this was <1 second, but now i seem to be seeing about a 6seconds per 6k rows (whereas the common usages for this end up being much higher numbers than that).
Using some timestamping and returning from the SP, i see that the processing of the data in the function isnt the issue at all..it appears to entirely be transfer time of the payload. In this particular case its a simple array of UDTT_MYTYPE's, and with a single object, execution is instantaneous, but w/ 6k, its up to the 6-7 seconds range. And this performance persists even if i pass it off to an empty function (removing the cost of the unnest/insert).
In reality, udtt_mytype has 12 columns of various types, but we are still talking about a relatively 'shallow' object.
I have attempted to compare this to NPGSqls' documentation on Bulk copy (found here http://www.npgsql.org/doc/copy.html), and that implementation seemed to be even slower than this, which seems contradictive.
Is postgres typically this much slower than MSSQL, or is there something that may be limiting xfer rate of data that im not aware of? Obviously no one can speak for my network connectivity/hardware setup, but anyone that may have converted between the two, was a performance increase seen along this same scale?

Related

Where is stored procedure result set stored while SqlReader is reading?

As per docs SqlCommandTimeout is
This property is the cumulative time-out (for all network packets that
are read during the invocation of a method) for all network reads
during command execution or processing of the results. A time-out can
still occur after the first row is returned, and does not include user
processing time, only network read time.
For example, with a 30 second time out, if Read requires two network
packets, then it has 30 seconds to read both network packets. If you
call Read again, it will have another 30 seconds to read any data that
it requires.
I have code below that executes the stored procedure and then reads the data using SqlReader row by row.
public static async Task<IEnumerable<AvailableWorkDTO>> prcGetAvailableWork(this MyDBContext dbContext, int userID)
{
var timeout = 120
var result = new List<AvailableWorkDTO>();
using (var cmd = dbContext.Database.GetDbConnection().CreateCommand())
{
var p1 = new SqlParameter("#UserID", SqlDbType.Int)
{
Value = userID
};
cmd.CommandText = "dbo.prcGetAvailableWork";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(p1);
cmd.CommandTimeout = timeout;
await dbContext.Database.OpenConnectionAsync().ConfigureAwait(false);
using (var reader = await cmd.ExecuteReaderAsync().ConfigureAwait(false))
{
while (await reader.ReadAsync().ConfigureAwait(false))
{
var item = new AvailableWorkDTO();
item.ID = reader.GetInt32(0);
item.Name = reader.GetString(1);
item.Title = reader.GetString(2);
item.Count = reader.GetInt32(3);
result.Add(item);
}
}
}
return result;
}
In Sql Profiler I see only one call to stored procedure as expected. So I am guessing the stored proc executes and returns entire result set.
Questions
1>If SqlReader is reading one record at a time, where is the entire resultset is stored while reader is reading? Is it temporarily stored in SQL Server memory or Application Server memory?
2>Using EF Core is there any way to read the entire result set at once?
The resultset isn't stored anywhere, it's streamed directly to the client. As the server reads rows from disk or memory, they are fed through the query plan and out across the network. This is why you always need to make sure read as fast as possible and to dispose the reader and connection: because the query is running the whole time.
To "read the entire result set at once", you just do what you are doing now: loop the reader and add it to a List. Alternatively, you could use DataTable.Load, however I do not advise this, and it is also not async.
The reader is just an object that is capable of returning individual rows from a command. What you see in the profiler is a single execution of a command. If you also monitor SQL:Batch Completed event, you will see that that only happens when the reader is finished.
You can use a stored procedure with EF instead of ADO.Net, but I am not sure that it will be faster.
Create a special class to get data from the stored procedure, or use existing AvailableWorkDTO. This class should have all properties that select clause of your stored procedure has. You don't need to select everything in your stored procedure. Just select the properties that AvailableWorkDTO has and add NotMapped attribute
[NotMapped]
public class AvailableWorkDTO
{
.....
}
after this add this class to dbContext DbSet
public virtual DbSet<AvailableWorkDTO> AvailableWorkDTOs { get; set; }
And this is a sample function to show how to get data using the stored procedure
public async Task<IEnumerable<AvailableWorkDTO>> prcGetAvailableWork(MyDBContext dbContext, int userID)
{
var pId = new SqlParameter("#UserID", userID);
return await dbContext.Set<AvailableWorkDTO>()
.FromSqlRaw("Execute db.prcGetAvailableWork #UserID", pId)
.ToArrayAsync();
}

System.NotSupportedException: Commands with multiple queries cannot have out parameters

I ran into another issue with using a data reader around a sproc with multiple ref cursors coming out. I am getting a not supported exception. Unfortunately, i can see from where it is coming from the source code of npgsql however.. i am not sure if i agree with throwing that exception. The code we have written works with oracle (both fully managed and managed flavors), sql server. Any help appreciated to keep it consistent for an api across some of those key flavors of dbms out there.
sproc body
CREATE OR REPLACE FUNCTION public.getmultipleresultsets (
v_organizationid integer)
RETURNS Setof refcursor
LANGUAGE 'plpgsql'
AS $BODY$
declare public override void AddCursorOutParameter(DbCommand command,
string RefCursorName)
{
NpgsqlParameter parameter = (NpgsqlParameter)CreateParameter(RefCursorName, false);
parameter.NpgsqlDbType = NpgsqlDbType.Refcursor;
parameter.NpgsqlValue = DBNull.Value;
parameter.Direction = ParameterDirection.Output;
command.Parameters.Add(parameter);
}
cv_1 refcursor;
cv_2 refcursor;
BEGIN
open cv_1 for
SELECT a.errorCategoryId, a.name, a.bitFlag
FROM ErrorCategories a
ORDER BY name;
RETURN next cv_1;
open cv_2 for
SELECT *
FROM StgNetworkStats ;
RETURN next cv_2;
END;
$BODY$;
Key Reader code that wraps postgres sql (Entlib implementation of npgsql)
private IDataReader DoExecuteReader(DbCommand command, CommandBehavior cmdBehavior)
{
try
{
var sql = new StringBuilder();
using (var reader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
while (reader.Read())
{
sql.AppendLine($"FETCH ALL IN \"{ reader.GetString(0) }\";");
}
}
command.CommandText = sql.ToString();
command.CommandType = CommandType.Text;
IDataReader reader2 = command.ExecuteReader(cmdBehavior);
return reader2;
}
catch (Exception)
{
throw;
}
}
The command building code is shown below
Helper.InitializeCommand(cmd, 300, "getmultipleresultsets");
db.AddReturnValueParameter(cmd);
db.AddInParameter(cmd, "organizationId", DbType.Int32, ORGANIZATIONID);
db.AddCursorOutParameter(cmd, "CV_1");
db.AddCursorOutParameter(cmd, "CV_2
The code that adds the refcursor parameter goes something like this
You code above seems to garble the PostgreSQL function with the .NET client code attempting to read its result.
Regardless, your function is declared to return a set of refcursors - this is not the same as two output parameters; you seem to be confusing the name of the cursor (cursors have names, but not ints, for example) with the name of the parameter (int parameters do have names).
Please note that PostgreSQL does not actually have output parameters - a function always returns a single table, and that's it. PostgreSQL does have a function syntax with output parameters, but that is only a way to construct the schema of the output table. This is unlike SQL Server, which apparently can return both a table and a set of named output parameters. To facilitate portability, when reading results, if Npgsql sees any NpgsqlParameter with direction out, it will attempt to find a resultset with the name of the parameter and will simply populate the NpgsqlParameter's Value with the first row's value for that column. This practice has zero added value over simply reading the resultset yourself - it's just there for compatibility.
To sum it up, I'd suggest you read the refcursors with your reader and then fetch their results as appropriate.

Refcursor in stored procedure with postgres

I am a rookie/newbie in the postgres data access api. I have worked a bit on oracle, sql server and trying to do what i have done with those dbms
The use is very simple
1) a stored procedure aka function with input params
2) Returning or more ref cursors
3) Using an ent lib wrapper to use the npgsql provider/database with it
4) Doing a data adapter fill and running into the issue with some cursor de-referencing.. it appears though i am inside a tran..
5) I just want to get some simple working sample with the latest npgsql provider..
Here is my function
CREATE OR REPLACE FUNCTION public.geterrorcategories(
v_organizationid integer)
RETURNS refcursor
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE cv_1 refcursor;
BEGIN
open cv_1 for
SELECT errorCategoryId, name, bitFlag
FROM ErrorCategories
ORDER BY name;
RETURN cv_1;
END;
$BODY$;
The code using the enterprise lib api/wrapper is as follows.
/// <summary>
/// Executes GetErrorCategories in case of SQL Server or GetErrorCategories for Oracle
/// </summary>
public static DataTable GetErrorCategoriesAsDataTable(string dbKey ,int? ORGANIZATIONID)
{
DataTable tbl = new DataTable();
Database db = Helper.GetDatabase(dbKey);
using (DbConnection con = db.CreateConnection()){
con.Open();
var tran = con.BeginTransaction();
using (DbCommand cmd = con.CreateCommand()){
cmd.Transaction = tran;
BuildGetErrorCategoriesCommand(db, cmd ,ORGANIZATIONID);
cmd.CommandText = "GetErrorCategories";
try {
Helper.FillDataTable(tbl, db, cmd);
con.Close();
} catch (DALException ) {
throw;
}
}
}
return tbl;
}
The command is built as follows.
private static void BuildGetErrorCategoriesCommand(Database db, DbCommand cmd ,int? ORGANIZATIONID){
Helper.InitializeCommand(cmd, 300, "GetErrorCategories");
db.AddReturnValueParameter(cmd);
db.AddInParameter(cmd, "organizationId", DbType.Int32, ORGANIZATIONID);
db.AddCursorOutParameter(cmd, "CV_1");
}
I am not getting any error. I get only 1 row back which i think is this un_named_portal_1 or something but not the results from my table which my query returns
It is frustrating as i would like to keep my application code the same as much as possible but would like to switch providers at run time. I am using a tweaked 'ent lib' contribution database that was created for npgsql.
Hope this helps to point me to the right areas to look for..
There is absolutely no reason above to declare your PostgreSQL function to return a cursor - you can simply return a table, see the PostgreSQL docs for more info.
Npgsql originally had a feature where it automatically "dereferenced" cursors returned from functions, but this has been removed. For more information about this see this issue (warning, it's long...). Some people are requesting that the feature be returned.

Dapper with Access, update statement partially not working

I have a product class and tried to evaluate Dapper with Access database.. Select, Delete and Insert operations are working fine, but I have a problem with update operation. It is working in one way only code below)
When I tried to change the Description based on ProductNumber it works (updateStatement2) and Description get updated, but when I tried to change the ProductNumber based on Description (updateStatement1) it doesn't work and ProductNumber doesn't get updated. It bit strange to me. Is it a bug or am I missing anything?. My database is just a basic one and no primary keys set. I have attached a screenshot below
(For more information see my code below)
public class Products
{
public string ProductNumber { get; set; }
public string Description { get; set; }
}
static void Main(string[] args)
{
using (var con = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=test.mdb"))
{
Products product2 = new Products();
product2.ProductNumber = "P2";
product2.Description = "TestProduct2Changed";
var updateStatement2 = #"Update Products Set Description = #Description Where ProductNumber = #ProductNumber";
int outp2 = con.Execute(updateStatement2, product2);
Products product1 = new Products();
product1.ProductNumber = "P3Changed";
product1.Description = "TestProduct3";
var updateStatement1 = #"Update Products Set ProductNumber = #ProductNumber Where Description = #Description";
int outp1 = con.Execute(updateStatement1, product1);
}
}
I am using Dapper version 1.50.2. This is my database screenshot
It looks like ADO Access commands require the parameters to be present in the same order as they appear in the SQL query.
In your original code, for the query that works, the parameters appear in the query string in alphabetical order -
Update Products Set Description = #Description Where ProductNumber = #ProductNumber
This works because the properties are taken from "product2" in alphabetical order. This may not be by design, it might just be the order in which reflection lists them.
In your query that fails, the parameters appear in reverse alphabetical order -
Update Products Set ProductNumber = #ProductNumber Where Description = #Description
.. and this fails because the parameter values get mis-assigned within Access.
You should be able confirm this by changing the order of the parameters in your dynamic parameter alternative. I tried using dynamic parameters and it worked when the parameters were in the same order as which they appeared in the SQL query but failed if they weren't. The database I'm using isn't quite the same as yours but the following should illustrate what I'm talking about:
// Doesn't work (parameter order is incorrect)
con.Execute(
"Update People Set PersonName = #PersonName Where Notes = #Notes",
new { Notes = "NotesChanged", PersonName = "New Name" }
);
// DOES work (parameter order is correct)
con.Execute(
"Update People Set PersonName = #PersonName Where Notes = #Notes",
new { PersonName = "New Name", Notes = "NotesChanged" }
);
While trying to find more information about this, I came across this answer that unfortunately seems to confirm the issue: https://stackoverflow.com/a/11424444/3813189
I guess that it might be possible for the custom SQL generator that you've mentioned in one of your other questions to do some magic to parse the query and retrieve the parameters in the order in which they must appear and to then ensure that they are provided in the correct order.. if someone is maintaining an Access connector for DapperExtensions then it might be worth raising an issue. Because, at the moment, I think that you are correct and that it is an issue with the library.

Execute multiple stored procedures with single trip to database

I have a lot of legacy data access code mainly SqlCommand with Stored Procedure calls that we used to execute alot of Insert statment into an database.
As long as the SQL server has been on the same machine as the application there have been acceptable performace but now are we trying to move some of the data to SQL Azure.
The problem is that our code calls a SP for every record to insert which results in quite a few trips to the database and when not located on the same server it takes some time.
var conn = new SqlConnection("connString")
var cmd = new SqlCommand(conn, "spMyStoreProc");
cmd.Params.Add("#a", SqlDbType.VarChar, 10);
cmd.Params.Add("#b", SqlDbType.Int);
using(conn)
{
conn.Open();
foreach(var rec in recordsToInsert)
{
cmd.Parameters["#a"].Value = rec.A;
cmd.Parameters["#b"].Value = rec.B;
cmd.ExecuteNonQuery();
}
conn.Close();
}
I have tried the code above with and without Transactions.
I have also tried to use a "batch" SQL statement to execute several SPs in every trip to the server.
Like this:
var cmd = new SqlCommand(conn);
cmd.CommandText = "EXEC spMyStoreProc #a='a' #b=2; EXEC spMyStoreProc #a='b' #b=4;"
It greatly increases the performance of the operation but since I have quite a few SPs where every SP has about 20-50 params it gets quite tedious to write this code for all the insert commands in this data access component.
Is this the best way to achive this, or can I somehow tell ADO.NET I want to execute my calls as a batch (havent fount anything suggesting its possible but feel that I atleast should ask) to avoid network latency etc betweeen every single SP call?
If not does anybody know any good way to achive this without having to write it "by hand" and since its a legacy application I can not change the data layer completely.
Is there any applications that can take SqlCommands with parameters and generate the TQL they would execute?
Thanks in advance
You should probably have one stored procedure, that calls all the other stored procedures - it will probably be the least amount of work. So, from the code you only call the stored procedures once... so given that they are the same parameters you are passing every time (because your code seems to imply that) you would basically do something like this:
CREATE PROCEDURE sp_RunBatch(#param1, #param2, etc [all the parameters you need])
AS
exec spMyStoreProc #a='a'
exec spMyStoreProc2 #b='b'
The advantages of this are many, some of which being that its all centralized, and you can even wrap all of them within a transaction, so as not to do dirty inserts (given that they all depend on each other).
Also, if you don't feel like passing 20/30 parameters to each SP, you may want to make a user-table-defined data-type for each set of parameters, that you can pass. So then each SP gets 1 or 2 parameters, and the code becomes much simpler and readable.
EDIT:
This is a good reference for the user-defined table types: http://msdn.microsoft.com/en-us/library/bb675163.aspx
And this is how to pass the table valued types to SQL server: http://msdn.microsoft.com/en-us/library/bb675163.aspx
An alternative to M.R.'s approach would be to send all your parameters as an XML document, then parse the XML document to extract your parameters. This may simplify the interface a bit.
However I think you were on something when you discussed the possibility of chaining all the commands in a single string. But instead of manually building them, consider building an extension method to the SqlCommand object that returns a single string for execution, leveraging the sp_executesql syntax, and execute the entire string in a single pass.
So you would have a loop that looks like this, and you would call a new ToInlineSql extension method:
string sqlCommand = "";
foreach(var rec in recordsToInsert)
{ cmd.Parameters["#a"].Value = rec.A;
cmd.Parameters["#b"].Value = rec.B;
sqlCommand += cmd.ToInlineSql();
}
// execute sqlCommand
The ToInlineSql extension method could look like this (peuso-code, you will have to add certain things such as checking for the data type and so on) [and here is the link to sp_executesql:
public static class SqlCmdExt
{
public static string ToInlineSql(this SqlCommand cmd)
{
string sql = "sp_executesql " + cmd.CommandText ;
foreach (SqlParameter p in cmd.Parameters)
{
sql += ", #" + p.Name + " " + p.DataType.ToString() ;
sql += ", " + p.Name + " = " + p.Value;
}
sql += ";";
return sql;
}
}

Resources