Npgsql default parameter behaviour - npgsql

If I have an update statement such as update foo set bar = #bar, baz = #baz, and create a command which is missing the parameters, it appears that the update will use the current values for those columns.
I haven't been able to find the documentation for this in either Npgsql or Postgresql - is it a supported feature than I can rely on, or just something that happens?
Trivial example:
using System;
using Npgsql;
namespace MissingParametersUpdate
{
static class Program
{
// you will need to have CREATE TABLE foo ( bar integer, baz integer )
static void Main(string[] args)
{
using (var connection = new NpgsqlConnection(args[0]))
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = #"delete from foo";
command.ExecuteNonQuery();
}
using (var command = connection.CreateCommand())
{
command.CommandText = #"insert into foo ( bar, baz ) values ( 1, 2 ), (3, 4)";
command.ExecuteNonQuery();
}
DumpValues("Initial", connection);
// empty update
using (var command = connection.CreateCommand())
{
command.CommandText = #"update foo set bar = #bar, baz = #baz";
command.ExecuteNonQuery();
}
DumpValues("Empty Update", connection);
// update bar
using (var command = connection.CreateCommand())
{
command.CommandText = #"update foo set bar = #bar, baz = #baz";
command.Parameters.AddWithValue(#"bar", 42);
command.ExecuteNonQuery();
}
DumpValues("Update Bar", connection);
// update baz
using (var command = connection.CreateCommand())
{
command.CommandText = #"update foo set bar = #bar, baz = #baz";
command.Parameters.AddWithValue(#"baz", 12);
command.ExecuteNonQuery();
}
DumpValues("Update Baz", connection);
}
}
private static void DumpValues(string caption, NpgsqlConnection connection)
{
Console.WriteLine(caption);
using (var command = connection.CreateCommand())
{
command.CommandText = #"select bar, baz from foo";
using (var reader = command.ExecuteReader())
while (reader.Read())
Console.WriteLine(" (bar: {0}, baz: {1})", reader.GetInt32(0), reader.GetInt32(1));
}
Console.WriteLine();
}
}
}

This is indeed a bit odd, here's what's going on.
PostgreSQL accepts positional parameter placeholders in the format $1, $2, etc. However, it is somewhat standard in .NET to have named placeholders, e.g. #bar, #baz. In order to support this, Npgsql parses your SQL client-side to find any parameter placeholders (e.g. #bar). When one is found, it looks for an NpgsqlParameter with the corresponding name on the NpgsqlCommand, and replaces it with a PostgreSQL-compatible positional placeholder (e.g. $1).
Now, if Npgsql encounters a placeholder without a corresponding NpgsqlParameter, it simply leaves it alone. At some point it threw an exception, but there were some cases in which Npgsql's internal SQL parser wasn't good enough and wrongly identified portions of the query as parameter placeholders. Leaving identified placeholders which don't have a corresponding NpgsqlParameter resolves this issue.
All that's for saying that your PostgreSQL receives the literal SQL update foo set bar = #bar, baz = #baz, without any sort of manipulation on Npgsql's part. Now, PostgreSQL consider # as a special character - you can define it as an operator (see https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html). However, by default # doesn't appear to do anything at all, so what you actually did was run update foo set bar = bar, baz = baz, which obviously does nothing at all. You can see the # behavior by executing SELECT #foo - PostgreSQL will respond with an error saying column "foo" does not exist.
So it's an interplay between Npgsql leaving the query as-is because the parameter isn't set, and PostgreSQL ignoring the #.

Related

Preparing statements and batching in npgsql

The Simple Preparation example in the docs (https://www.npgsql.org/doc/prepare.html#simple-preparation) shows an example where parameters are set after the command is prepared.
var cmd = new NpgsqlCommand(...);
cmd.Parameters.Add("param", NpgsqlDbType.Integer);
cmd.Prepare();
// Set parameters
cmd.ExecuteNonQuery();
// And so on
Questions
How are the parameters set?
Is it possible to use AddWithValue instead of Add if the AddWithValue(String, NpgsqlDbType, Object) method which specifies NpgsqlDbType is used -- docs say "setting the value isn't support"?
How does this work if multiple statements exist in the same command?
This answer (https://stackoverflow.com/a/53268090/10984827) shows that multiple commands in a single string can be prepared together but it's not clear how this CommandText string is created.
Edit: I think I'm almost there but I'm not sure how to create and execute the batched the query string. Here's my naive attempt at building a batched query using a StringBuilder. This doesn't work. How do I do this correctly?
using System;
using System.Collections.Generic;
using System.Text;
using Npgsql;
using NpgsqlTypes;
class Model
{
public int value1 { get; }
public int value2 { get; }
public Model(int value1, int value2)
{
this.value1 = value1;
this.value2 = value2;
}
}
class Program
{
static void Main(string[] args)
{
var dataRows = new List<Model>();
dataRows.Add(new Model(3,2));
dataRows.Add(new Model(27,-10));
dataRows.Add(new Model(11,-11));
var connString = "Host=127.0.0.1;Port=5432;Username=postgres;Database=dbtest1";
// tabletest1
// ----------
// id SERIAL PRIMARY KEY
// , value1 INT NOT NULL
// , value2 INT NOT NULL
using (var conn = new NpgsqlConnection(connString))
{
conn.Open();
var cmd = new NpgsqlCommand();
cmd.Connection = conn;
cmd.CommandText = $"INSERT INTO tabletest1 (value1,value2) VALUES (#value1,#value2)";
var parameterValue1 = cmd.Parameters.Add("value1", NpgsqlDbType.Integer);
var parameterValue2 = cmd.Parameters.Add("value2", NpgsqlDbType.Integer);
cmd.Prepare();
var batchCommand = new StringBuilder();
foreach (var d in dataRows)
{
parameterValue1.Value = d.value1;
parameterValue2.Value = d.value2;
batchCommand.Append(cmd.CommandText);
batchCommand.Append(";");
}
Console.WriteLine(batchCommand.ToString());
// conn.ExecuteNonQuery(batchCommand.ToString());
}
}
}
1) Simply capture the NpgsqlParameter returned from Add(), and then set its Value property:
var p = cmd.Parameters.Add("p", NpgsqlDbType.Integer);
cmd.Prepare();
p.Value = 8;
cmd.ExecuteNonQuery();
2) You can use AddWithValue() in the same way, but if you're preparing the command in order to reuse it several times, that makes less sense. The idea is that you first add the parameter without a value, then prepare, then execute it several times, setting the value each time.
3) You can prepare a multi-statement command. As things work now, all statements in the command will share the same parameter list (which lives on NpgsqlCommand). So the same pattern holds: create your command with your SQL and parameters, prepare it, and then set parameter values and execute. Each individual statement within your command will run prepared, benefiting from the perf increase.
Here's an example with two-statement batching:
cmd.CommandText = "INSERT INTO tabletest1 (value1,value2) VALUES (#v1,#v2); INSERT INTO tabletest1 (value1, value2) VALUES (#v3,#v4)";
var v1 = cmd.Parameters.Add("v1", NpgsqlDbType.Integer);
var v2 = cmd.Parameters.Add("v2", NpgsqlDbType.Integer);
var v3 = cmd.Parameters.Add("v3", NpgsqlDbType.Integer);
var v4 = cmd.Parameters.Add("v4", NpgsqlDbType.Integer);
cmd.Prepare();
while (...) {
v1.Value = ...;
v2.Value = ...;
v3.Value = ...;
v4.Value = ...;
cmd.ExecuteNonQuery();
}
However, if the objective is to efficiently insert lots of data, consider using COPY instead - it will be faster than even batched inserts.
Finally, to complete the picture, for INSERT statements specifically you can include more than one row in a single statement:
INSERT INTO tabletest1 (value1, value2) VALUES (1,2), (3,4)
You can also again parameterize the actual values, and prepare this command. This is similar to batching two INSERT statements, and should be faster (although still slower than COPY).
In NpgSQL 6.0 there has been the addition of batching/pipelining.
Here is an updated example:
await using var connection = new NpgsqlConnection(connString);
await connection.OpenAsync();
var batch = new NpgsqlBatch(connection);
const int count = 10;
const string parameterName = "parameter";
for (int i = 0; i < count; i++)
{
var batchCommand = new NpgsqlBatchCommand($"SELECT #{parameterName} as value");
batchCommand.Parameters.Add(new NpgsqlParameter(parameterName, i));
batch.BatchCommands.Add(batchCommand);
}
await batch.PrepareAsync();
var results = new List<int>(count);
await using (var reader = await batch.ExecuteReaderAsync())
{
do
{
while (await reader.ReadAsync())
{
results.Add(await reader.GetFieldValueAsync<int>("value"));
}
} while (await reader.NextResultAsync());
}
Console.WriteLine(string.Join(", ", results));

best solution for multiple insert update solution

Struggle with understanding C# & Npgsql as a beginner. Following code examples:
// Insert some data
using (var cmd = new NpgsqlCommand())
{ cmd.Connection = conn;
cmd.CommandText = "INSERT INTO data (some_field) VALUES (#p)";
cmd.Parameters.AddWithValue("p", "Hello world");
cmd.ExecuteNonQuery();
}
The syntax for more than one insert & update statement like this is clear so far:
cmd.CommandText = "INSERT INTO data (some_field) VALUES (#p);INSERT INTO data1...;INSERT into data2... and so on";
But what is the right solution for a loop which should handle one statement within.
This works not:
// Insert some data
using (var cmd = new NpgsqlCommand())
{
foreach(s in SomeStringCollectionOrWhatever)
{
cmd.Connection = conn;
cmd.CommandText = "INSERT INTO data (some_field) VALUES (#p)";
cmd.Parameters.AddWithValue("p", s);
cmd.ExecuteNonQuery();
}
}
It seems the values will be "concatenated" or remembered. I cannot see any possibility to "clear" the existing cmd-object.
My second solution would be to wrap the whole "using" block into the loop. But every cycle would create a new object. That seems ugly to me.
So what is the best solution for my problem?
To insert lots of rows efficiently, take a look at Npgsql's bulk copy feature - the API is more suitable (and more efficient) for inserting large numbers of rows than concatenating INSERT statements into a batch like you're trying to do.
If you want to rerun the same SQL with changing parameter values, you can do the following:
using (var cmd = new NpgsqlCommand("INSERT INTO data (some_field) VALUES (#p)", conn))
{
var p = new NpgsqlParameter("p", DbType.String); // Adjust DbType according to type
cmd.Parameters.Add(p);
cmd.Prepare(); // This is optional but will optimize the statement for repeated use
foreach(var s in SomeStringCollectionOrWhatever)
{
p.Value = s;
cmd.ExecuteNonQuery();
}
}
If you need lots of rows and performance is key then i would recommend Npgsql's bulk copy capability as #Shay mentioned. But if you are looking for quick way to do this without the bulk copy i would recommend to use Dapper.
Consider the example below.
Lets say you have a class called Event and a list of events to add.
List<Event> eventsToInsert = new List<Event>
{
new Event() { EventId = 1, EventName = "Bday1" },
new Event() { EventId = 2, EventName = "Bday2" },
new Event() { EventId = 3, EventName = "Bday3" }
};
The snippet that would add the list to the DB shown below.
var sqlInsert = "Insert into events( eventid, eventname ) values (#EventId, #EventName)";
using (IDbConnection conn = new NpgsqlConnection(cs))
{
conn.Open();
// Execute is an extension method supplied by Dapper
// This code will add all the entries in the eventsToInsert List and match up the values based on property name. Only caveat is that the property names of the POCO should match the placeholder names in the SQL Statement.
conn.Execute(sqlInsert, eventsToInsert);
// If we want to retrieve the data back into the list
List<Event> eventsAdded;
// This Dapper extension will return an Ienumerable, so i cast it to a List.
eventsAdded = conn.Query<Event>("Select * from events").ToList();
foreach( var row in eventsAdded)
{
Console.WriteLine($"{row.EventId} {row.EventName} was added");
}
}
-HTH

SQLiteDataReader GetFieldType() returns Int64, but then fails on GetInt64() - is this a bug or feature?

When reading from an SQLiteDataReader I'm experiencing some odd behaviour whereby GetFieldType(0) returns typeof(Int64), GetValue(0) returns an Int64, but GetInt64(0) fails with an System.InvalidCastException exception.
It has taken me a rather long time to reproduce this behaviour:
using System;
using System.Data.SQLite;
using NUnit.Framework;
namespace Test
{
[TestFixture]
public class SQLiteType
{
[Test]
public void A()
{
var sqlConnection = new SQLiteConnection("Data Source=:memory:;Version=3;");
sqlConnection.Open();
var create = sqlConnection.CreateCommand();
create.CommandText = "CREATE TABLE FOO (x INTEGER)";
create.ExecuteNonQuery();
var insert = sqlConnection.CreateCommand();
insert.CommandText = "INSERT INTO FOO VALUES (?)";
var param = insert.CreateParameter();
param.Value = new TimeSpan(0); // NOTE INSERTING TIMESPAN DIRECTLY instead of .Ticks
insert.Parameters.Add(param);
insert.ExecuteNonQuery();
var select = sqlConnection.CreateCommand();
select.CommandText = "SELECT x FROM FOO";
var dr = select.ExecuteReader();
while (dr.Read())
{
var valueObject = dr.GetValue(0);
Assert.AreEqual(typeof (Int64), valueObject.GetType());
var valueType = dr.GetFieldType(0);
Assert.AreEqual(typeof (Int64), valueType);
var value = dr.GetInt64(0); // throws System.InvalidCastException
}
}
}
}
It seems to occur when the row is created by inserting a TimeSpan value directly into an INTEGER column (instead of e.g. TimeSpan.Ticks which might be more meaningful). Despite this, the datareader is still telling me that column is an Int64.
I'm not exactly sure what the contract is for SQLiteDataReader but I had previously assumed that if GetFieldType() returns a typeof(Int64), then GetInt64() should not fail. Perhaps this is not the case? (It seems quite odd that GetValue() still returns an Int64) Maybe it is an artifact of SQLite's unique dynamic typing system.
Certainly it is not hard to avoid, but for pedagogical reasons I am just curious why this is happening?
The root cause may have to do with how types are handled with SQLite:
http://www.sqlite.org/datatype3.html#affinity
Even then, this looks like a bug to me; if:
dr.GetValue(0).GetType() == typeof(System.Int64)
then it should certainly follow that dr.GetInt64(0) doesn't throw an exception. I would send an email to sqlite-users#sqlite.org as described here: http://www.sqlite.org/src/wiki?name=Bug+Reports
Please note though that if you replace:
param.Value = new TimeSpan(0);
with
param.Value = new TimeSpan(0).Ticks;
then
var value = dr.GetInt64(0);
works fine. I'm bringing this up because I'm not sure there is any conversion assumption to make when you assign that TimeSpan. For instance, there is no implicit or explicit conversion from TimeSpan to long.

SQL server refusing to cache plan for a fixed length parameterized IN clause

Using .NET 4.0, I have defined the following sqlcommand. When I execute the sqlcommand multiple times consecutively without making any changes, SQL Server refuses to cache the query plan.
string[] colors = new string[] { "red", "blue", "yellow", "green" };
string cmdText = "SELECT * FROM ColoredProducts WHERE Color IN ({0})";
string[] paramNames = tags.Select(
(s, i) => "#color" + i.ToString()
).ToArray();
string inClause = string.Join(",", paramNames);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause))) {
for(int i = 0; i < paramNames.Length; i++) {
cmd.Parameters.AddWithValue(paramNames[i], tags[i]);
}
//Execute query here
}
I know it's refusing the cache the plan because the following query was running at a fraction of the time after consecutive runs:
string[] colors = new string[] { "red", "blue", "yellow", "green" };
string cmdText = "SELECT * FROM ColoredProducts WHERE Color IN ({0})";
string inClause = string.Join(",", colors);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause))) {
//Execute query here
}
In my actual test case the param list is fixed at a size of exactly 2000. The scenario I am attempting to optimize is selecting a specific set of 2000 records from a very large table. I would like for the query to be as fast as possible so I really want it to cached.
Sleepy post Edit:
The question is, why wouldn't this plan get cached? And yes, I have confirmed that the query is not in the cache using sys.dm_exec_cached_plans and sys.dm_exec_sql_test.
Here is an idea using a table-valued parameter. Please let us know if this approach performs better than your huge string. There are other ideas too, but this is the closest to treating your set of colors as an array.
In SQL Server:
CREATE TYPE dbo.Colors AS TABLE
(
Color VARCHAR(32) -- be precise here! Match ColoredProducts.Color
PRIMARY KEY
);
GO
CREATE PROCEDURE dbo.MatchColors
#colors AS dbo.Colors READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT cp.* -- use actual column names please!
FROM dbo.ColoredProducts AS cp -- always use schema prefix
INNER JOIN #colors AS c
ON cp.Color = c.Color;
END
GO
Now in C#:
DataTable tvp = new DataTable();
tvp.Columns.Add(new DataColumn("Color"));
tvp.Rows.Add("red");
tvp.Rows.Add("blue");
tvp.Rows.Add("yellow");
tvp.Rows.Add("green");
// ...
using (connectionObject)
{
SqlCommand cmd = new SqlCommand("dbo.MatchColors", connectionObject);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter tvparam = cmd.Parameters.AddWithValue("#colors", tvp);
tvparam.SqlDbType = SqlDbType.Structured;
// execute query here
}
I can almost guarantee this will perform better than an IN list with a large number of parameters, regardless of the length of the actual string in your C# code.

SqlServer better to batch statements or foreach?

Hypothetically, is it better to send N statements to Sql Server (2008), or is it better to send 1 command comprising N statements to Sql Server? In either case, I am running the same statement over a list of objects, and in both cases I would be using named parameters. Suppose my use case is dumping a cache of log items every few hours.
foreach example
var sql = "update blah blah blah where id = #id";
using(var conn = GetConnection())
{
foreach(var obj in myList)
{
var cmd = new SqlCommand()
{CommandText = sql, Connection = conn};
//add params from obj
cmd.ExecuteNonQuery();
}
}
batch example
var sql = #"
update blah blah blah where id = #id1
update blah blah blah where id = #id2
update blah blah blah where id = #id3
-etc";
using (var conn = GetConnection())
{
var cmd = new SqlCommand
{ CommandText = sql, Connection = conn};
for(int i=0; i<myList.Count; i++)
{
//add params: "id" + i from myList[i]
}
cmd.ExecuteNonQuery();
}
In time tests, the batch version took 15% longer than the foreach version for large inputs. I figure the batch version takes longer to execute because the server has to parse a huge statement and bind up to 2000 parameters. Supposing Sql Server is on the LAN, is there any advantage to using the batch method?
Your tests would seem to have given you the answer however let me add another. It is preferrable to encapsulate the update into a separate function and call that using a foreach:
private function UpdateFoo( int id )
{
const sql = "Update Foo Where Id = #Id";
using ( var conn = GetConnection() )
{
using ( var cmd = new SqlCommand( sql, conn ) )
{
cmd.AddParameterWithValue( "#Id", id )
cmd.ExecuteNonQuery();
}
}
}
private function UpdateLotsOfFoo()
{
foreach( var foo in myList )
{
UpdateFoo( foo.Id );
}
}
In this setup you are leveraging connection pooling which mitgates the cost of opening and closing connections.
#Thomas - this design can increase overhead of opening / closing connections in a loop. This is not a preferred practice and should be avoided. The code below allows the iteration of the statements while using one connection and will be easier on resources (both client and server side).
private void UpdateFoo(int id)
{
const string sql = "Update Foo Where Id = #Id";
using (var conn = GetConnection())
{
conn.Open();
foreach (var foo in myList)
{
UpdateFoo(foo.Id);
using (var cmd = new SqlCommand(sql, conn))
{
cmd.AddParameterWithValue("#Id", id);
cmd.ExecuteNonQuery();
}
}
conn.Close();
}
}

Resources