I am working on a Web API and Entity Framework 6 that is doing a "bulk" insert of under 500 records at any given time to a Microsoft SQL Server table. The DbContext.SaveChanges() method will insert all the records into a table in a couple seconds, so have no issues with that. However, when the method is called to insert the same number of records into the same table with a semi-extensive trigger attached to it, the process can take many minutes. The trigger has some calls to table joins and inserts into other tables and then deletes the newly inserted record.
I do not have much control of the table or the trigger, so I am looking for suggestions on how to improve performance. I made a suggestion to move the trigger to a stored procedure and have the trigger call the stored procedure, but I am uncertain if that will achieve any gains.
EDIT: As I understand my question was kind of generic, I will post some of my code in case it helps. The SQL is not mine, so I will see what I can actually post.
Here is the part of my Web API method that does the call to SaveChanges():
string[] stringArray = results[0].Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);
var profileObjs = db.Set<T_ProfileStaging>();
foreach (var s in stringArray)
{
string[] columns = s.Split(new[] {",", "\t"}, StringSplitOptions.None);
if (columns.Length == 6)
{
T_ProfileStaging profileObj = new T_ProfileStaging();
profileObj.CompanyCode = columns[0];
profileObj.SubmittedBy = columns[1];
profileObj.VersionName = columns[2];
profileObj.DMName = columns[3];
profileObj.Zone = columns[4];
profileObj.DMCode = columns[5];
profileObj.ProfileName = columns[6];
profileObj.Advertiser = columns[7];
profileObj.OriginalInsertDate = columns[8];
profileObjs.Add(profileObj);
}
}
try
{
db.SaveChanges();
return Ok();
}
catch (Exception e)
{
return Content(HttpStatusCode.BadRequest, "SQL Server Insert Exception");
}
When you load with SaveChanges() EF will send each row in a separate INSERT statement. So if you have a statement trigger, it will run for each row.
To work around this you need either
use a bulk load API from the client (instead of EF's SaveChanges()) using SqlBulkCopy directly, or one of the many EF extensions that wrap it.
or
Configure EF to insert into a different table and then INSERT ... SELECT into the target table
Related
I have an Invoice Database that contains an ID IDENTITY that SQL Server is autogenerating by an increment of one (+1) each time a new record is created by a LINQ Insert.
The code that I am currently using to create a new record is posted below and the Incremental ID is autogenerated by SQL Server.
public async Task<IActionResult> Create([Bind(
"PurchaseOrder,InvDate,DelDate,PaidDate,AgentName,FullName,FirstName, LastName,CustId,CompanyName,ClientRole,Email,Phone,Address," +
"City,State,Zip,Country,ProdCode,Description,Quantity,UnitPrice,LineTotal,OrderTotal,Tax,Discount,Credit," +
"Shipping,GrandTotal,Deposit,AmtDue,DiscAmt,Notes,Published,Year,Expenses,ProjectDescription,ClientProvision")]
CreateNewOrderViewModel cnq)
{
int invId;
try
{
await _context.AddAsync(cnq);
await _context.SaveChangesAsync();
}
catch (InvalidCastException e)
{
ViewBag.Result = $"Database insert failed with: {e}";
return View();
}
}
My issue is with the SQL Server ID IDENTITY. Every time the server is rebooted, my ID IDENTITY value increases by a factor of 1000 instead of the default value of 1, which for example, changes/increases the next record that I created by a factor of 1000. Hence, if my last record was 1001, the next record that is created will be 2001, instead of 1002. This behavior continues every time the server is updated and needs to be rebooted. I searched for an answer and discovered that the issue is a SQL Server bug that is based on the Cached protocol that remembers the latest ID values.
Since I am on a Shared Hosting Server and do not have full control of the Database, I only have DBO to my own database. I was wondering if there was a way for me to use LINQ to generate the incremental value for a new InvID column that I can then use as the record ID, instead of the SQL Server generated value.
If you don't have control over database to handle sql server identity gap issue, you need to manually read the largest Id from the table and increment it by 1 for the new record.
Sample code to retrieve the last Id:
int lastId = _context.TableA.OrderByDescending(t => t.Id).FirstOrDefault().Id;
int newId = lastId + 1;
// Add the new record with newId
Reading several recommendations, I decided to check and reseed the ID before running the LINQ query.
using (var connection = new SqlConnection(_data.DATA))
{
connection.Open();
try
{
var seed = new SqlCommand("CreateIDSeed", connection)
{
CommandType = CommandType.StoredProcedure
};
seed.ExecuteNonQuery();
await _context.AddAsync(cnq);
await _context.SaveChangesAsync();
}
catch (InvalidCastException e)
{
ViewBag.Result = $"Database insert failed with: {e}";
return View();
}
}
Where the CreateIDSeed looks like this:
CREATE PROCEDURE [dbo].[CreateIDSeed]
AS
BEGIN
SET NOCOUNT ON;
declare #newId int
select #newId = max(ID) from dbo.CreateNewOrderViewModel
DBCC CheckIdent('dbo.CreateNewOrderViewModel', RESEED, #newId)
END
GO
I tried inserting a few test records and it seems to be working, but I will know more when the Server is rebooted.
I'm trying to use linq to sql for integration testing of stored procedures. I'm trying to call an updating stored procedure and after that retrieving the updated row from db to verify the change. All this should happen in one transaction so that I can rollback the transaction after the verification.
The code fails in assert, because the the row I retrieved does not seem to be updated. I know that my SP works when called from ordinary code. Is it even possible see the updated row in same transaction?
I'm using Sql Server 2008 and used sqlmetal.exe to create linq-to-sql mapping.
I've tried many different things, and right now my code looks following:
DbTransaction transaction = null;
try
{
var context =
new DbConnection(
ConfigurationManager.ConnectionStrings["MyConnectionString"].ConnectionString);
context.Connection.Open();
transaction = context.Connection.BeginTransaction();
context.Transaction = transaction;
const string newUserName= "TestUserName";
context.SpUpdateUserName(136049 , newUserName);
context.SubmitChanges();
// select to verify
var user=
(from d in context.Users where d.NUserId == 136049 select d).First();
Assert.IsTrue(user.UserName == newUserName);
}
finally
{
if (transaction != null) transaction.Rollback();
}
I believe you are coming acress a stale datacontext issue.
Your update is done through a stored procedure so your context does not "see" the changes and has no way to update the Users.
If you use a new datacontext to do the assert, it usually works well. However, since you are using a transaction you probably have to add the second datacontext to the same transaction.
We're having a strange problem in Oracle. I'll sketch some (simplified) context first:
Consider this mapping to an Entity:
public EntityMap()
{
Table("EntityTable");
Id(x => x.Id)
.Column("entityID")
.GeneratedBy.Native("ENTITYID").UnsavedValue(0);
Map(x => x.SomeBoolean).Column("SomeBoolean");
}
and this code:
var entity = new Entity();
using (var transaction = new TransactionScope(TransactionScopeOption.Required))
{
Session.Save(entity);
transaction.Complete();
}
//A lot of code
if(someCondition)
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required))
{
enitity.SomeBoolean = true;
Session.Update(entity);
transaction.Complete();
}
}
This code is called a few times. The first time it generates the following queries:
select ENTITYID.nextval from dual
INSERT INTO Entity
(SomeBoolean, EntityID)
VALUES (0, 1216)
UPDATE Entity
SET SomeBoolean = 1
WHERE EntityID = 1216
The second time it is called these queries are generated (someCondition is false)
select ENTITYID.nextval from dual
INSERT INTO Entity
(SomeBoolean, EntityID)
VALUES (0, 1217)
And now the trouble begins. From now on, each insert will use the correct autoincremented value, but the update will always use 1217
select ENTITYID.nextval from dual
INSERT INTO Entity
(SomeBoolean, EntityID)
VALUES (0, 1218)
UPDATE Entity
SET SomeBoolean = 1
WHERE EntityID = 1217
And of course, this is not what we want to happen. If I inspect the value of the Id while debugging, it contains the correct autoincremented value. Somehow, deep in the bowels of NHibernate, the incorrect is is assigned to the WHERE clause...
The strange part is that this only happens on Oracle. If I switch NHibernate to MsSql, everything works like a charm.
So I found out what happened. NHibernate changed it's default connection release mode between versions 1.x and 2.x. Instead of closing the connection when the session is Disposed, the connections is now closed after each transaction. However, we were manually coordinating our transactions which apparently caused troubles in Oracle.
This question has some extra information and this entry in the NHibernate documentation also clarifies how the connections are handeled:
As of NHibernate, if your application manages transactions through .NET APIs such as System.Transactions library, ConnectionReleaseMode.AfterTransaction may cause NHibernate to open and close several connections during one transaction, leading to unnecessary overhead and transaction promotion from local to distributed. Specifying ConnectionReleaseMode.OnClose will revert to the legacy behavior and prevent this problem from occuring.
This blog post is what got me looking in the right direction.
I am trying to use Dapper support my data access for my server app.
My server app has another application that drops records into my database at a rate of 400 per minute.
My app pulls them out in batches, processes them, and then deletes them from the database.
Since data continues to flow into the database while I am processing, I don't have a good way to say delete from myTable where allProcessed = true.
However, I do know the PK value of the rows to delete. So I want to do a delete from myTable where Id in #listToDelete
Problem is that if my server goes down for even 6 mintues, then I have over 2100 rows to delete.
Since Dapper takes my #listToDelete and turns each one into a parameter, my call to delete fails. (Causing my data purging to get even further behind.)
What is the best way to deal with this in Dapper?
NOTES:
I have looked at Tabled Valued Parameters but from what I can see, they are not very performant. This piece of my architecture is the bottle neck of my system and I need to be very very fast.
One option is to create a temp table on the server and then use the bulk load facility to upload all the IDs into that table at once. Then use a join, EXISTS or IN clause to delete only the records that you uploaded into your temp table.
Bulk loads are a well-optimized path in SQL Server and it should be very fast.
For example:
Execute the statement CREATE TABLE #RowsToDelete(ID INT PRIMARY KEY)
Use a bulk load to insert keys into #RowsToDelete
Execute DELETE FROM myTable where Id IN (SELECT ID FROM #RowsToDelete)
Execute DROP TABLE #RowsToDelte (the table will also be automatically dropped if you close the session)
(Assuming Dapper) code example:
conn.Open();
var columnName = "ID";
conn.Execute(string.Format("CREATE TABLE #{0}s({0} INT PRIMARY KEY)", columnName));
using (var bulkCopy = new SqlBulkCopy(conn))
{
bulkCopy.BatchSize = ids.Count;
bulkCopy.DestinationTableName = string.Format("#{0}s", columnName);
var table = new DataTable();
table.Columns.Add(columnName, typeof (int));
bulkCopy.ColumnMappings.Add(columnName, columnName);
foreach (var id in ids)
{
table.Rows.Add(id);
}
bulkCopy.WriteToServer(table);
}
//or do other things with your table instead of deleting here
conn.Execute(string.Format(#"DELETE FROM myTable where Id IN
(SELECT {0} FROM #{0}s", columnName));
conn.Execute(string.Format("DROP TABLE #{0}s", columnName));
To get this code working, I went dark side.
Since Dapper makes my list into parameters. And SQL Server can't handle a lot of parameters. (I have never needed even double digit parameters before). I had to go with Dynamic SQL.
So here was my solution:
string listOfIdsJoined = "("+String.Join(",", listOfIds.ToArray())+")";
connection.Execute("delete from myTable where Id in " + listOfIdsJoined);
Before everyone grabs the their torches and pitchforks, let me explain.
This code runs on a server whose only input is a data feed from a Mainframe system.
The list I am dynamically creating is a list of longs/bigints.
The longs/bigints are from an Identity column.
I know constructing dynamic SQL is bad juju, but in this case, I just can't see how it leads to a security risk.
Dapper request the List of object having parameter as a property so in above case a list of object having Id as property will work.
connection.Execute("delete from myTable where Id in (#Id)", listOfIds.AsEnumerable().Select(i=> new { Id = i }).ToList());
This will work.
i have a console app in c# that extracts 20 fields from an oracle DB witht he code below and i wanted an efficient way to insert them into SQL 2005.
i dotn want to insert each one of the 20,000 within the while loop, obviously. i was thinking to change the code to use a data set to cache all the records and then do a bulk insert...
thoughts?
pseudo code would be nice since i am new to oracle.
this is my code where i was testing getting a connection to oracle and seeing if i can view the data... now i can view it i want to get it out and into sql2005... what do i do from here?
static void getData()
{
string connectionString = GetConnectionString();
using (OracleConnection connection = new OracleConnection())
{
connection.ConnectionString = connectionString;
connection.Open();
OracleCommand command = connection.CreateCommand();
string sql = "SELECT * FROM BUG";
command.CommandText = sql;
OracleDataReader reader = command.ExecuteReader();
while (reader.Read())
{
//string myField = (string)reader["Project"];
string myField = reader[0].ToString();
Console.WriteLine(myField);
}
}
}
You can create a CSV file and then use BULK INSERT to insert the file into SQL Server. Have a look here for an example.
The "bulk" insert with the cached Dataset will work exactly like the while loop you are not wanting to write! The problem is that you'll lose control of the process if you try to use the "bulk" insert of the Dataset class. It is extraneous work in the end.
Maybe the best solution is to use a DataWriter so that you have complete control and no Dataset overhead.
You can actually do 100-1000 inserts per sql batch. Just generate multiple inserts, then submit. Pregenerate the next SELECT batch WHILE THE FIRST EXECUTES.