SQL CLR Trigger - get source table - sql-server

I am creating a DB synchronization engine using SQL CLR Triggers in Microsoft SQL Server 2012. These triggers do not call a stored procedure or function (and thereby have access to the INSERTED and DELETED pseudo-tables but do not have access to the ##procid).
Differences here, for reference.
This "sync engine" uses mapping tables to determine what the table and field maps are for this sync job. In order to determine the target table and fields (from my mapping table) I need to get the source table name from the trigger itself. I have come across many answers on Stack Overflow and other sites that say that this isn't possible. But, I've found one website that provides a clue:
Potential Solution:
using (SqlConnection lConnection = new SqlConnection(#"context connection=true")) {
SqlCommand cmd = new SqlCommand("SELECT object_name(resource_associated_entity_id) FROM sys.dm_tran_locks WHERE request_session_id = ##spid and resource_type = 'OBJECT'", lConnection);
cmd.CommandType = CommandType.Text;
var obj = cmd.ExecuteScalar();
}
This does in fact return the correct table name.
Question:
My question is, how reliable is this potential solution? Is the ##spid actually limited to this single trigger execution? Or is it possible that other simultaneous triggers will overlap within this process id? Will it stand up to multiple executions of the same and/or different triggers within the database?
From these sites, it seems the process Id is in fact limited to the open connection, which doesn't overlap: here, here, and here.
Will this be a safe method to get my source table?
Why?
As I've noticed similar questions, but all without a valid answer for my specific situation (except that one). Most of the comments on those sites ask "Why?", and in order to preempt that, here is why:
This synchronization engine operates on a single DB and can push changes to target tables, transforming the data with user-defined transformations, automatic source-to-target type casting and parsing and can even use the CSharpCodeProvider to execute methods also stored in those mapping tables for transforming data. It is already built, quite robust and has good performance metrics for what we are doing. I'm now trying to build it out to allow for 1:n table changes (including extension tables requiring the same Id as the 'master' table) and am trying to "genericise" the code. Previously each trigger had a "target table" definition hard coded in it and I was using my mapping tables to determine the source. Now I'd like to get the source table and use my mapping tables to determine all the target tables. This is used in a medium-load environment and pushes changes to a "Change Order Book" which a separate server process picks up to finish the CRUD operation.
Edit
As mentioned in the comments, the query listed above is quite "iffy". It will often (after a SQL Server restart, for example) return system objects like syscolpars or sysidxstats. But, it seems that in the dm_tran_locks table there's always an associated resource_type of 'RID' (Row ID) with the same object_name. My current query which works reliably so far is the following (will update if this changes or doesn't work under high load testing):
select t1.ObjectName FROM (
SELECT object_name(resource_associated_entity_id) as ObjectName
FROM sys.dm_tran_locks WHERE resource_type = 'OBJECT' and request_session_id = ##spid
) t1 inner join (
SELECT OBJECT_NAME(partitions.OBJECT_ID) as ObjectName
FROM sys.dm_tran_locks
INNER JOIN sys.partitions ON partitions.hobt_id = dm_tran_locks.resource_associated_entity_id
WHERE resource_type = 'RID'
) t2 on t1.ObjectName = t2.ObjectName
If this is always the case, I'll have to find that out during testing.

How reliable is this potential solution?
While I do not have time to set up a test case to show it not working, I find this approach (even taking into account the query in the Edit section) "iffy" (i.e. not guaranteed to always be reliable).
The main concerns are:
cascading (whether recursive or not) Trigger executions
User (i.e. Explicit / Implicit) transactions
Sub-processes (i.e. EXEC and sp_executesql)
These scenarios allow for multiple objects to be locked, all at the same time.
Is the ##SPID actually limited to this single trigger execution? Or is it possible that other simultaneous triggers will overlap within this process id?
and (from a comment on the question):
I think I can join my query up with the sys.partitions and get a dm_trans_lock that has a type of 'RID' with an object name that will match up to the one in my original query.
And here is why it shouldn't be entirely reliable: the Session ID (i.e. ##SPID) is constant for all of the requests on that Connection). So all sub-processes (i.e. EXEC calls, sp_executesql, Triggers, etc) will all be on the same ##SPID / session_id. So, between sub-processes and User Transactions, you can very easily get locks on multiple resources, all on the same Session ID.
The reason I say "resources" instead of "OBJECT" or even "RID" is that locks can occur on: rows, pages, keys, tables, schemas, stored procedures, the database itself, etc. More than one thing can be considered an "OBJECT", and it is possible that you will have page locks instead of row locks.
Will it stand up to multiple executions of the same and/or different triggers within the database?
As long as these executions occur in different Sessions, then they are a non-issue.
ALL THAT BEING SAID, I can see where simple testing would show that your current method is reliable. However, it should also be easy enough to add more detailed tests that include an explicit transaction that first does some DML on another table, or have a trigger on one table do some DML on one of these tables, etc.
Unfortunately, there is no built-in mechanism that provides the same functionality that ##PROCID does for T-SQL Triggers. I have come up with a scheme that should allow for getting the parent table for a SQLCLR Trigger (that takes into account these various issues), but haven't had a chance to test it out. It requires using a T-SQL trigger, set as the "first" trigger, to set info that can be discovered by the SQLCLR Trigger.
A simpler form can be constructed using CONTEXT_INFO, if you are not already using it for something else (and if you don't already have a "first" Trigger set). In this approach you would still create a T-SQL Trigger, and then set it as the "first" Trigger using sp_settriggerorder. In this Trigger you SET CONTEXT_INFO to the table name that is the parent of ##PROCID. You can then read CONTEXT_INFO() on a Context Connection in a SQLCLR Trigger. If there are multiple levels of Triggers then the value of CONTEXT INFO will get overwritten, so reading that value must be the first thing you do in each SQLCLR Trigger.

This is an old thread, but it is an FAQ and I think I have a better solution. Essentially it uses the schema of the inserted or deleted table to find the base table by doing a hash of the column names and comparing the hash with the hashes of tables with a CLR trigger on them.
Code snippet below - at some point I will probably put the whole solution on Git (it sends a message to Azure Service Bus when the trigger fires).
private const string colqry = "select top 1 * from inserted union all select top 1 * from deleted";
private const string hashqry = "WITH cols as ( "+
"select top 100000 c.object_id, column_id, c.[name] "+
"from sys.columns c "+
"JOIN sys.objects ot on (c.object_id= ot.parent_object_id and ot.type= 'TA') " +
"order by c.object_id, column_id ) "+
"SELECT s.[name] + '.' + o.[name] as 'TableName', CONVERT(NCHAR(32), HASHBYTES('MD5',STRING_AGG(CONVERT(NCHAR(32), HASHBYTES('MD5', cols.[name]), 2), '|')),2) as 'MD5Hash' " +
"FROM cols "+
"JOIN sys.objects o on (cols.object_id= o.object_id) "+
"JOIN sys.schemas s on (o.schema_id= s.schema_id) "+
"WHERE o.is_ms_shipped = 0 "+
"GROUP BY s.[name], o.[name]";
public static void trgSendSBMsg()
{
string table = "";
SqlCommand cmd;
SqlDataReader rdr;
SqlTriggerContext trigContxt = SqlContext.TriggerContext;
SqlPipe p = SqlContext.Pipe;
using (SqlConnection con = new SqlConnection("context connection=true"))
{
try
{
con.Open();
string tblhash = "";
using (cmd = new SqlCommand(colqry, con))
{
using (rdr = cmd.ExecuteReader(CommandBehavior.SingleResult))
{
if (rdr.Read())
{
MD5 hash = MD5.Create();
StringBuilder hashstr = new StringBuilder(250);
for (int i=0; i < rdr.FieldCount; i++)
{
if (i > 0) hashstr.Append("|");
hashstr.Append(GetMD5Hash(hash, rdr.GetName(i)));
}
tblhash = GetMD5Hash(hash, hashstr.ToString().ToUpper()).ToUpper();
}
rdr.Close();
}
}
using (cmd = new SqlCommand(hashqry, con))
{
using (rdr = cmd.ExecuteReader(CommandBehavior.SingleResult))
{
while (rdr.Read())
{
string hash = rdr.GetString(1).ToUpper();
if (hash == tblhash)
{
table = rdr.GetString(0);
break;
}
}
rdr.Close();
}
}
if (table.Length == 0)
{
p.Send("Error: Unable to find table that CLR trigger is on. Message not sent!");
return;
}
….
HTH

Related

Concurrent updates on a single staging table

I am developing a service application (VB.NET) which pulls information from a source and imports it to a SQL Server database
The process can involve one or more “batches” of information at a time (the number and size of batches in any given “run” is arbitrary based on a queue maintained elsewhere)
Each batch is assigned an identifier (BatchID) so that the set of records in the staging table which belong to that batch can be easily identified
The ETL process for each batch is sequential in nature; the raw data is bulk inserted to a staging table and then a series of stored procedures perform updates on a number of columns until the data is ready for import
These stored procedures are called in sequence by the service and are generally simple UPDATE commands
Each SP takes the BatchID as an input parameter and specifies this as the criteria for inclusion in each UPDATE, á la :
UPDATE dbo.stgTable
SET FieldOne = (CASE
WHEN S.[FieldOne] IS NULL
THEN T1.FieldOne
ELSE
S.[FieldOne]
END
)
, FieldTwo = (CASE
WHEN S.[FieldTwo] IS NULL
THEN T2.FieldTwo
ELSE
S.[FieldTwo]
END
)
FROM dbo.stgTable AS S
LEFT JOIN dbo.someTable T1 ON S.[SomeField] = T1.[SomeField]
LEFT JOIN dbo.someOtherTable T2 ON S.[SomeOtherField] = T2.[SomeOtherField]
WHERE S.BatchID = #BatchID
Some of the SP’s also refer to functions (both scalar and table-valued) and all incorporate a TRY / CATCH structure so I can tell from the output parameters if a particular SP has failed
The final SP is a MERGE operation to move the enriched data from the staging table into the production table (again, specific to the provided BatchID)
I would like to thread this process in the service so that a large batch doesn’t hold up smaller batches in the same run
I figured there should be no issue with this as no thread could ever attempt to process records in the staging table that could be targeted by another thread (no race conditions)
However, I’ve noticed that, when I do thread the process, arbitrary steps on arbitrary batches seem to fail (but no error is recorded from the output of the SP)
The failures are inconsistent; e.g. sometimes batches 2, 3 & 5 will fail (on SP’s 3, 5 & 7 respectively), other times it will be different batches, each at different steps in the sequence
When I import the batches sequentially, they all import perfectly fine – always!
I can’t figure out if this is an issue on the service side (VB.NET) – e.g. is each thread opening an independent connection to the DB or could they be sharing the same one (I’ve set it up that each one should be independent…)
Or if the issue is on the SQL Server side – e.g. is it not feasible for concurrent SP calls to manipulate data on the same table, even though, as described above, no thread/batch will ever touch records belonging to another thread/batch
(On this point – I tried using CTE’s to create subsets of data from the staging table based on the BatchID and apply the UPDATE’s to those instead but the exact same behaviour occurred)
WITH CTE AS (
SELECT *
FROM dbo.stgTable
WHERE BatchID = #BatchID
)
UPDATE CTE...
Or maybe the problem is that multiple SP’s are calling the same function at the same time and that is why one or more of them are failing (I don’t see why that would be a problem though?)
Any suggestions would be very gratefully received – I’ve been playing around with this all week and I can’t for the life of me determine precisely what the problem might be!
Update to include sample service code
This is the code in the service class where the threading is initiated
For Each ItemInScope In ScopedItems
With ItemInScope
_batches(_batchCount) = New Batch(.Parameter1, .Parameter2, .ParameterX)
With _batches(_batchCount)
If .Initiate() Then
_doneEvents(_batchCount) = New ManualResetEvent(False)
Dim _batchWriter = New BatchWriter(_batches(_batchCount), _doneEvents(_batchCount))
ThreadPool.QueueUserWorkItem(AddressOf _batchWriter.ThreadPoolCallBack, _batchCount)
Else
_doneEvents(_batchCount) = New ManualResetEvent(True)
End If
End With
End With
_batchCount += 1
Next
WaitHandle.WaitAll(_doneEvents)
Here is the BatchWriter class
Public Class BatchWriter
Private _batch As Batch
Private _doneEvent As ManualResetEvent
Public Sub New(ByRef batch As Batch, ByVal doneEvent As ManualResetEvent)
_batch = batch
_doneEvent = doneEvent
End Sub
Public Sub ThreadPoolCallBack(ByVal threadContext As Object)
Dim threadIndex As Integer = CType(threadContext, Integer)
With _batch
If .PrepareBatch() Then
If .WriteTextOutput() Then
.ProcessBatch()
End If
End If
End With
_doneEvent.Set()
End Sub
End Class
The PrepareBatch and WriteTextOutput functions of the Batch class are entirely contained within the service application - it is only the ProcessBatch function where the service starts to interact with the database (via Entity Framework)
Here is that function
Public Sub ProcessScan()
' Confirm that a file is ready for import
If My.Computer.FileSystem.FileExists(_filePath) Then
Dim dbModel As New DatabaseModel
With dbModel
' Pass the batch to the staging table in the database
If .StageBatch(_batchID, _filePath) Then
' First update (results recorded for event log)
If .UpdateOne(_batchID) Then
_stepOneUpdates = .RetUpdates.Value
' Second update (results recorded for event log)
If .UpdateTwo(_batchID) Then
_stepTwoUpdates = .RetUpdates.Value
' Third update (results recorded for event log)
If .UpdateThree(_batchID) Then
_stepThreeUpdates = .RetUpdates.Value
....
End Sub

is there something faster than Enumerable.Except<TSource> Method?

I have a program that downloads data from server database to client database. server database keeps growing recently.
in that program, there is an option to select download all data OR download data for a specific time period (can select backward days from today). if the user selects all, I wrote the program to truncate client database table and insert all data using bulk copy. that part is ok.
but the problem is when user select a specific time period (each recode has created data time ) program has to compare two tables and divide recodes (server data) in two tables. one is, not exist data and the second one is not existing data. and what I'm going to do is,
not existing data directly insert into client DB (i'm using bulk insert) and Existing data inserting into a tempory table using bulkcopy and after update client's table using the above tempory table. My actual problem occurs when dividing server's table. this is how I did it
updateTable = (From c In dt_from_server.AsEnumerable()
Join o In Dt_from_client.AsEnumerable()
On c.Field(Of String)("BARCODE").Trim() Equals o.Field(Of String)("BARCODE").Trim()
And c.Field(Of String)("ITEM_CODE").Trim() Equals o.Field(Of String)("ITEM_CODE").Trim()
Select c).CopyToDataTable()
insertTable = dt_server.AsEnumerable()
.Except(updateTable.AsEnumerable(), DataRowComparer.Default)
.CopyToDataTable()
(normally there is over 1M recodes in the server table )
when there is over 1 Milion recodes, Update part taking acceptable time like 10 minutes (Yes it taking 5GB space from Ram - in this case, it's ok when considering performance )
but insert part seams taking days, just to assing the insertTable(datatable). this is the issue.
AsEnumerable().Except() part taking long time and I couldn't find a solution speedup this process. I'm not sure I explained this correctly. Could anyone can give me some advice for this?
Since you have commented that dt_from_server and dt_server are actually the same DataTable you don't need to compare all values of all DataRows with each other, which is what DataRowComparer.Default does. You can use Except without second parameter for the comparer, then only references are compared which is much faster.
You also don't need two CopyToDataTable which creates two additonal big DataTables in memory, process the rows one after the other.
Here is a different approach using Linq's left-outer join, which is more efficient:
Dim query = from rServ in dt_from_server.AsEnumerable()
group join rClient in Dt_from_client.AsEnumerable()
On New With{
Key .BarCode = rServ.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rServ.Field(Of String)("ITEM_CODE").Trim()
} Equals New With{
Key .BarCode = rClient.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rClient.Field(Of String)("ITEM_CODE").Trim()
} into Group
From client In Group.DefaultIfEmpty()
Select new With { .ServerRow = rServ, .InsertRow = client is Nothing }
Dim insertOrUpdateRows = query.ToLookup(Function(x) x.InsertRow, Function(x) x.ServerRow)
Dim insertRows = insertOrUpdateRows(true).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
Dim updateRows = insertOrUpdateRows(false).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
But in general the most scalable and efficient approach would be to not load all into memory at once and then process all, but to use database paging(or a stored-procedure) to process only parts of it in memory, otherwise it's likely that you will encounter a OutOfMemoryException sooner or later.
C# as requested:
var query = from rServ in dt_from_server.AsEnumerable()
join rClient in Dt_from_client.AsEnumerable()
on new { BarCode = rServ.Field<string>("BARCODE").Trim(), ItemCode = rServ.Field<string>("ITEM_CODE").Trim() }
equals new { BarCode = rClient.Field<string>("BARCODE").Trim(), ItemCode = rClient.Field<string>("ITEM_CODE").Trim() }
into clientGroup
from client in clientGroup.DefaultIfEmpty()
select new { ServerRow = rServ, InsertRow = client == null };
var insertOrUpdateRows = query.ToLookup(x => x.InsertRow, x => x.ServerRow);
var insertRows = insertOrUpdateRows[true].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now
var updateRows = insertOrUpdateRows[false].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now

Is there a gain in efficiency if I put a lengthy SQL select into a stored procedure?

I have the following SQL Server 2012 query:
var sql = #"Select question.QuestionUId
FROM Objective,
ObjectiveDetail,
ObjectiveTopic,
Problem,
Question
where objective.examId = 1
and objective.objectiveId = objectiveDetail.objectiveId
and objectiveDetail.ObjectiveDetailId = ObjectiveTopic.ObjectiveDetailId
and objectiveTopic.SubTopicId = Problem.SubTopicId
and problem.ProblemId = question.ProblemId";
var a = db.Database.SqlQuery<string>(sql).ToList();
Can someone help explain to me if it would be a good idea to put this into a
stored procedure and if so then how could I do that and then call it from my C# code. It was
suggested to me that if it is in a stored procedure then it would run more
efficiently as it would not be recompiled often. Is that the case?
Yes, there is. For starters, a stored procedure is precompiled and stored within your database. Being precompiled, the database engine can execute it more efficiently, since no on-the-fly compilation necessary. Also, database optimizations can be added to support a precompiled procedure. A stored procedure also allows business logic to be encapsulated within the database.
If you decide to go the stored procedure route, then consider the following:
First of all, you will need to create a stored procedure that encapsulates your existing SQL query.
CREATE PROCEDURE ListQuestionIds
#ExamId int
AS
BEGIN
SELECT Question.QuestionUId
FROM Objective
INNER JOIN ObjectiveDetail
ON ( Objective.objectiveId = ObjectiveDetail.objectiveId )
INNER JOIN ObjectiveTopic
ON ( ObjectiveDetail.ObjectiveDetailId = ObjectiveTopic.ObjectiveDetailId )
INNER JOIN Problem
ON ( ObjectiveTopic.SubTopicId = Problem.SubTopicId )
INNER JOIN Question
ON ( Problem.ProblemId = Question.ProblemId )
WHERE Objective.examId = #ExamId;
END;
Please make sure that the tables called by your procedure (Objective, Problem, etc,) have all of the relevant primary keys and indexes in place to enhance the performance of your query.
Next, you will need to call that stored procedure from within your C# code. One way--but by no means the only way--is to create a connection to your database using the SqlConnection object and then executing your procedure via the SqlCommand object.
I would recommend that you take a look at How to execute a stored procedure within C# program for some on-topic examples. But a simple example of such might look like:
string connectionString = "your_connection_string";
using (var con = new SqlConnection(connectionString))
{
using (var cmd = new SqlCommand("ListQuestionIds", con)) {
cmd.CommandType = CommandType.StoredProcedure
cmd.Parameters.Add(new SqlParameter("#ExamId", examId))
con.Open();
using (SqlDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
// Loop through the returned SqlDataReader object (aka. rdr) and
// then evaluate & process the returned question id value(s) here
}
}
}
}
Please note that this sample code does not (intentionally) include any error handling. I leave that up to you to integrate into your application.
Finally, just as an FYI... many of the more modern ORMs (e.g., Entity Framework, NHibernate, etc.) allow you to execute stored procedure-like queries from your C# code without requiring an explicit stored procedure. If you are already using an ORM in your application, then you may want to forgo the stored procedure altogether. Whatever you decide to do, a little research on your end will help you make an informed decision.
I hope this helps you get started. Good luck.

How to create VIEW in MS Access Database using Delphi Application without installing MSAccess on PC?

I want to create VIEW definitions on MS Access. I have used following CREATE VIEW Statement:
SELECT
MFP.FollowUpPlan_Id,
MFP.FollowUpPlan_Name AS PlanName,
DFP.Sequence_No AS SequenceNo,
MFS.FollowUpSchedule_Name AS ScheduleName
FROM
MAS_FollowUp_Plan AS MFP,
DET_FollowUp_Plan AS DFP,
MAS_FollowUp_Schedule AS MFS
WHERE
(((MFP.FollowUpPlan_Id)=DFP.FollowUpPlan_Id) AND
((DFP.FollowUpSchedule_Id)=MFS.FollowUpSchedule_Id)) AND
MFP.is_Deleted = FALSE AND
DFP.is_Deleted = false
ORDER BY
MFP.FollowUpPlan_Id, DFP.Sequence_No;
but it throw an error:
Only Simple Select Queries are allowed in view.
Please Help, Thanks in Advance.
The issue here, as Jeroen explained, is a limitation of Access' CREATE VIEW statement. For this case, you can use CREATE PROCEDURE instead. It will create a new member of the db's QueryDefs collection --- so from the Access user interface will appear as a new named query.
The following statement worked for me using ADO from VBScript. From previous Delphi questions on here, my understanding is that Delphi can also use ADO, so I believe this should work for you, too.
CREATE PROCEDURE ViewSubstitute AS
SELECT
MFP.FollowUpPlan_Id,
MFP.FollowUpPlan_Name AS PlanName,
DFP.Sequence_No AS SequenceNo,
MFS.FollowUpSchedule_Name AS ScheduleName
FROM
(MAS_FollowUp_Plan AS MFP
INNER JOIN DET_FollowUp_Plan AS DFP
ON MFP.FollowUpPlan_Id = DFP.FollowUpPlan_Id)
INNER JOIN MAS_FollowUp_Schedule AS MFS
ON DFP.FollowUpSchedule_Id = MFS.FollowUpSchedule_Id
WHERE
MFP.is_Deleted=False AND DFP.is_Deleted=False
ORDER BY
MFP.FollowUpPlan_Id,
DFP.Sequence_No;
You cannot mix ORDER BY with JOIN when creating views in Access. It will get you the error "Only simple SELECT queries are allowed in VIEWS." (note the plural VIEWS)
Having multiple tables in the FROM is a kind of to JOIN.
either remove the ORDER BY,
or have only one table in the FROM and no JOINs.
I remember from the past (when I did more Access stuff than now) seeing this for a large query with a single table select with an ORDER BY as well.
The consensus is that you should not have ORDER BY in views anyway, so that is your best thing to do.
Another reason that you can get the same error message is if you add parameters or sub selects. Access does not like those in views either, but that is not the case in your view.
Declare variable olevarCatalog ,cmd as OleVariant in Delphi, Uses ComObj
olevarCatalog := CreateOleObject('ADOX.Catalog');
olevarCatalog.create(YourConnectionString); //This Will create MDB file.
// Using ADO Query(CREATE TABLE TABLEName....) add the required Tables.
// To Insert View Definition on MDB file.
cmd := CreateOleObject('ADODB.Command');
cmd.CommandType := cmdText;
cmd.CommandText := 'ANY Kind of SELECT Query(JOIN, OrderBy is also allowed)';
olevarCatalog.Views.Append('Name of View',cmd);
cmd := null;
This is a best way to Create MS ACCESS File(.MDB) and VIEWs using Delphi.

Correct method of deleting over 2100 rows (by ID) with Dapper

I am trying to use Dapper support my data access for my server app.
My server app has another application that drops records into my database at a rate of 400 per minute.
My app pulls them out in batches, processes them, and then deletes them from the database.
Since data continues to flow into the database while I am processing, I don't have a good way to say delete from myTable where allProcessed = true.
However, I do know the PK value of the rows to delete. So I want to do a delete from myTable where Id in #listToDelete
Problem is that if my server goes down for even 6 mintues, then I have over 2100 rows to delete.
Since Dapper takes my #listToDelete and turns each one into a parameter, my call to delete fails. (Causing my data purging to get even further behind.)
What is the best way to deal with this in Dapper?
NOTES:
I have looked at Tabled Valued Parameters but from what I can see, they are not very performant. This piece of my architecture is the bottle neck of my system and I need to be very very fast.
One option is to create a temp table on the server and then use the bulk load facility to upload all the IDs into that table at once. Then use a join, EXISTS or IN clause to delete only the records that you uploaded into your temp table.
Bulk loads are a well-optimized path in SQL Server and it should be very fast.
For example:
Execute the statement CREATE TABLE #RowsToDelete(ID INT PRIMARY KEY)
Use a bulk load to insert keys into #RowsToDelete
Execute DELETE FROM myTable where Id IN (SELECT ID FROM #RowsToDelete)
Execute DROP TABLE #RowsToDelte (the table will also be automatically dropped if you close the session)
(Assuming Dapper) code example:
conn.Open();
var columnName = "ID";
conn.Execute(string.Format("CREATE TABLE #{0}s({0} INT PRIMARY KEY)", columnName));
using (var bulkCopy = new SqlBulkCopy(conn))
{
bulkCopy.BatchSize = ids.Count;
bulkCopy.DestinationTableName = string.Format("#{0}s", columnName);
var table = new DataTable();
table.Columns.Add(columnName, typeof (int));
bulkCopy.ColumnMappings.Add(columnName, columnName);
foreach (var id in ids)
{
table.Rows.Add(id);
}
bulkCopy.WriteToServer(table);
}
//or do other things with your table instead of deleting here
conn.Execute(string.Format(#"DELETE FROM myTable where Id IN
(SELECT {0} FROM #{0}s", columnName));
conn.Execute(string.Format("DROP TABLE #{0}s", columnName));
To get this code working, I went dark side.
Since Dapper makes my list into parameters. And SQL Server can't handle a lot of parameters. (I have never needed even double digit parameters before). I had to go with Dynamic SQL.
So here was my solution:
string listOfIdsJoined = "("+String.Join(",", listOfIds.ToArray())+")";
connection.Execute("delete from myTable where Id in " + listOfIdsJoined);
Before everyone grabs the their torches and pitchforks, let me explain.
This code runs on a server whose only input is a data feed from a Mainframe system.
The list I am dynamically creating is a list of longs/bigints.
The longs/bigints are from an Identity column.
I know constructing dynamic SQL is bad juju, but in this case, I just can't see how it leads to a security risk.
Dapper request the List of object having parameter as a property so in above case a list of object having Id as property will work.
connection.Execute("delete from myTable where Id in (#Id)", listOfIds.AsEnumerable().Select(i=> new { Id = i }).ToList());
This will work.

Resources