LINQ Inserts without IDENTITY column - sql-server

I'm using LINQ, but my database tables do not have an IDENTITY column (although they are using a surrogate Primary Key ID column)
Can this work?
To get the identity values for a table, there is a stored procedure called GetIDValueForOrangeTable(), which looks at a SystemValues table and increments the ID therein.
Is there any way I can get LINQ to get the ID value from this SystemValues table on an insert, rather than the built in IDENTITY?
As an aside, I don't think this is a very good idea, especially not for a web application. I imagine there will be a lot of concurrency conflicts because of this SystemValues lookup. Am I justified in my concern?
Cheers
Duncan

Sure you can make this work with LINQ, and safely, too:
wrap the access to the underlying SystemValues table in the "GetIDValue.....()" function in a TRANSACTION (and not with the READUNCOMMITTED isolation level!), then one and only one user can access that table at any given time and you should be able to safely distribute ID's
call that stored proc from LINQ just before saving your entity and store the ID if you're dealing with a new entity (if the ID hasn't been set yet)
store your entity in the database
That should work - not sure if it's any faster and any more efficient than letting the database handle the work - but it should work - and safely.
Marc
UPDATE:
Something like this (adapt to your needs) will work safely:
CREATE PROCEDURE dbo.GetNextTableID(#TableID INT OUTPUT)
AS BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
UPDATE SystemTables
SET MaxTableID = MaxTableID + 1
WHERE ........
SELECT
#TableID = MaxTableID
FROM
dbo.SystemTables
COMMIT TRANSACTION
END
As for performance - as long as you have a reasonable number (less than 50 maybe) of concurrent users, and as long as this SystemTables tables isn't used for much else, then it should perform OK.

You are very justified in your concern. If two users try to insert at the sametime, both might be given the same number unless you do as described by marc_s and put the thing in a transaction. However, if the transaction doesn't wrap around your whole insert as well as the table that contains the id values, you may still have gaps if the outer insert fails (It got a value but then for some other reason didn't insert a record). Since most people do this to avoid gaps (something that is in most cases an unnecessary requirement) it makes life more complicated and still may not achieve the result. Using an identity field is almost always a better choice.

Related

Move complete table to archive in MVC 4

I am working on an MVC 4 project which requires me to move data from an active table to a table for archived content.
I understand the with Entity Framework, the tables are tightly bounded with the models. I have created two models - one for the active records and one for the archived records.
What is the best way to add all the data in active table to archive and remove all the contents in active table for fresh use?
P.S: I am a bit paranoid about the error tolerance here, as I may be dealing with around 30000 records at a time. I need to successfully move all records to archive and ensure deleting them only after successful copy.
Even though you are using Entity Framework, you can still use Store Procedures. This is a good case to use a stored procedure, as you can do a set based operation in the sproc (fast), rather than iterating through all of the records in code (slow).
Here are some steps for how to add the sproc to the EF (you could just Google this too): Adding stored procedures complex types in Entity Framework
Your sproc would probably look something like:
SET IDENTITY_INSERT ON dbo.ArchiveTable --Assuming you have an identity column
INSERT INTO dbo.ArchiveTable(
Col1
,Col2
)
SELECT
Col1
,Col2
FROM dbo.MainTable
SET IDENTITY_INSERT OFF dbo.ArchiveTable --Assuming you have an identity column
DELETE * FROM dbo.MainTable
Wrap that in a transaction (to satisfy your error tolerance) and that should be a pretty quick execution for 30,000+ records. I would recommend that you return something like the number of records affected or something like that, all of which you should be able to return from the stored procedure.
Unless you have to, don't do it in code - wilsjd's answer is right on - create a transacted stored procedure that you call straight from EF.
But if you have a reason to do it in code - say because you don't have a good way to access both tables from within a stored procedure, just be sure to understand and do the right thing with the transaction from within your code. This is a good answer discussing this:
Entity Framework - Using Transactions or SaveChanges(false) and AcceptAllChanges()

How to do very fast inserts to SQL Server 2008

I have a project that involves recording data from a device directly into a sql table.
I do very little processing in code before writing to sql server (2008 express by the way)
typically i use the sqlhelper class's ExecuteNonQuery method and pass in a stored proc name and list of parameters that the SP expects.
This is very convenient, but i need a much faster way of doing this.
Thanks.
ExecuteNonQuery with an INSERT statement, or even a stored procedure, will get you into thousands of inserts per second range on Express. 4000-5000/sec are easily achievable, I know this for a fact.
What usually slows down individual updates is the wait time for log flush and you need to account for that. The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.
With batch commits you'll probably bottleneck on disk log write performance, which there is nothing you can do about it short of changing the hardware (going raid 0 stripe on log).
If you hit earlier bottlenecks (unlikely) then you can look into batching statements, ie. send one single T-SQL batch with multiple inserts on it. But this seldom pays off.
Of course, you'll need to reduce the size of your writes to a minimum, meaning reduce the width of your table to the minimally needed columns, eliminate non-clustered indexes, eliminate unneeded constraints. If possible, use a Heap instead of a clustered index, since Heap inserts are significantly faster than clustered index ones.
There is little need to use the fast insert interface (ie. SqlBulkCopy). Using ordinary INSERTS and ExecuteNoQuery on batch commits you'll exhaust the drive sequential write throughput much faster than the need to deploy bulk insert. Bulk insert is needed on fast SAN connected machines, and you mention Express so it's probably not the case. There is a perception of the contrary out there, but is simply because people don't realize that bulk insert gives them batch commit, and its the batch commit that speeds thinks up, not the bulk insert.
As with any performance test, make sure you eliminate randomness, and preallocate the database and the log, you don't want to hit db or log growth event during test measurements or during production, that is sooo amateurish.
bulk insert would be the fastest since it is minimally logged
.NET also has the SqlBulkCopy Class
Here is a good way to insert a lot of records using table variables...
...but best to limit it to 1000 records at a time because table variables are "in Memory"
In this example I will insert 2 records into a table with 3 fields -
CustID, Firstname, Lastname
--first create an In-Memory table variable with same structure
--you could also use a temporary table, but it would be slower
declare #MyTblVar table (CustID int, FName nvarchar(50), LName nvarchar(50))
insert into #MyTblVar values (100,'Joe','Bloggs')
insert into #MyTblVar values (101,'Mary','Smith')
Insert into MyCustomerTable
Select * from #MyTblVar
This is typically done by way of a BULK INSERT. Basically, you prepare a file and then issue the BULK INSERT statement and SQL Server copies all the data from the file to the table with the fast method possible.
It does have some restrictions (for example, there's no way to do "update or insert" type of behaviour if you have possibly-existing rows to update), but if you can get around those, then you're unlikely to find anything much faster.
If you mean from .NET then use SqlBulkCopy
Things that can slow inserts include indexes and reads or updates (locks) on the same table. You can speed up situations like yours by avoiding both and inserting individual transactions to a separate holding table with no indexes or other activity. Then batch the holding table to the main table a little less frequently.
It can only really go as fast as your SP will run. Ensure that the table(s) are properly indexed and if you have a clustered index, ensure that it has a narrow, unique, increasing key. Ensure that the remaining indexes and constraints (if any) do not have a lot of overhead.
You shouldn't see much overhead in the ADO.NET layer (I wouldn't necessarily use any other .NET library above SQLCommand). You may be able to use ADO.NET Async methods in order to queue several calls to the stored proc without blocking a single thread in your application (this potentially could free up more throughput than anything else - just like having multiple machines inserting into the database).
Other than that, you really need to tell us more about your requirements.

How to avoid circular relationship in SQL-Server?

I am creating a self-related table:
Table Item columns:
ItemId int - PK;
Amount money - not null;
Price money - a computed column using a UDF that retrieves value according to the items ancestors' Amount.
ParentItemId int - nullable, reference to another ItemId in this table.
I need to avoid a loop, meaning, a sibling cannot become an ancestor of his ancestors, meaning, if ItemId=2 ParentItemId = 1, then ItemId 1 ParentItemId = 2 shouldn't be allowed.
I don't know what should be the best practice in this situation.
I think I should add a CK that gets a Scalar value from a UDF or whatever else.
EDIT:
Another option is to create an INSTEAD OF trigger and put in 1 transaction the update of the ParentItemId field and selecting the Price field from the ##RowIdentity, if it fails cancel transaction, but I would prefer a UDF validating.
Any ideas are sincerely welcomed.
Does this definitely need to be enforced at the database level?
I'm only asking as I have databases like this (where the table similar to this is like a folder) and I only make sure that the correct parent/child relationships are set up in the application.
Checks like this is not easy to implement, and possible solutions could cause a lot of bugs and problems may be harder then initial one. Usually it is enough to add control for user's input and prevent infinite loop on read data.
If your application uses stored procedures, no ORM, than I would choose to implement this logic in SP. Otherwise - handle it in other layers, not in DB
How big of a problem is this, in real life? It can be expensive to detect these situations (using a trigger, perhaps). In fact, it's likely going to cost you a lot of effort, on each transaction, when only a tiny subset of all your transactions would ever cause this problem.
Think about it first.
A simple trick is to force the ParentItemId to be less than the ItemId. This prevents loop closure in this simple context.
However, there's a down side - if you need for some reason to delete/insert a parent, you may need to delete/insert all of its children in order as well.
Equally, hierarchies need to be inserted in order, and you may not be able to reassign a parent.
Tested and works just great:
CREATE TRIGGER Item_UPDATE
ON Item
FOR INSERT, UPDATE
AS
BEGIN
BEGIN TRY
SELECT Price FROM INSERTED
END TRY
BEGIN CATCH
RAISERROR('This item cannot be specified with this parent.', 16, 1)
ROLLBACK TRANSACTION;
END CATCH
END
GO

What should be returned when inserting into SQL?

A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?
We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.
The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.
Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)
Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI
I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.

Audit each inserted row in a Trigger

I am trying to do an audit history by adding triggers to my tables and inserting rows intto my Audit table. I have a stored procedure that makes doing the inserts a bit easier because it saves code; I don't have to write out the entire insert statement, but I instead execute the stored procedure with a few parameters of the columns I want to insert.
I am not sure how to execute a stored procedure for each of the rows in the "inserted" table. I think maybe I need to use a cursor, but I'm not sure. I've never used a cursor before.
Since this is an audit, I am going to need to compare the value for each column old to new to see if it changed. If it did change I will execute the stored procedure that adds a row to my Audit table.
Any thoughts?
I would trade space for time and not do the comparison. Simply push the new values to the audit table on insert/update. Disk is cheap.
Also, I'm not sure what the stored procedure buys you. Can't you do something simple in the trigger like:
insert into dbo.mytable_audit
(select *, getdate(), getdate(), 'create' from inserted)
Where the trigger runs on insert and you are adding created time, last updated time, and modification type fields. For an update, it's a little tricker since you'll need to supply named parameters as the created time shouldn't be updated
insert into dbo.mytable_audit (col1, col2, ...., last_updated, modification)
(select *, getdate(), 'update' from inserted)
Also, are you planning to audit only successes or failures as well? If you want to audit failures, you'll need something other than triggers I think since the trigger won't run if the transaction is rolled back -- and you won't have the status of the transaction if the trigger runs first.
I've actually moved my auditing to my data access layer and do it in code now. It makes it easier to both success and failure auditing and (using reflection) is pretty easy to copy the fields to the audit object. The other thing that it allows me to do is give the user context since I don't give the actual user permissions to the database and run all queries using a service account.
If your database needs to scale past a few users this will become very expensive. I would recommend looking into 3rd party database auditing tools.
There is already a built in function UPDATE() which tells you if a column has changed (but it is over the entire set of inserted rows).
You can look at some of the techniques in Paul Nielsen's AutoAudit triggers which are code generated.
What it does is check both:
IF UPDATE(<column_name>)
INSERT Audit (...)
SELECT ...
FROM Inserted
JOIN Deleted
ON Inserted.KeyField = Deleted.KeyField -- (AutoAudit does not support multi-column primary keys, but the technique can be done manually)
AND NOT (Inserted.<column_name> = Deleted.<column_name> OR COALESCE(Inserted.<column_name>, Deleted.<column_name>) IS NULL)
But it audits each column change as a separate row. I use it for auditing changes to configuration tables. I am not currently using it for auditing heavy change tables. (But in most transactional systems I've designed, rows on heavy activity tables are typically immutable, you don't have a lot of UPDATEs, just a lot of INSERTs - so you wouldn't even need this kind of auditing). For instance, orders or ledger entries are never changed, and shopping carts are disposable - neither would have this kind of auditing. On low volume change tables, like customer, you can use this kind of auditing.
Jeff,
I agree with Zodeus..a good option is to use a 3rd tool.
I have used auditdatabase (FREE)web tool that generates audit triggers (you do not need to write a single line of TSQL code)
Another good tools is Apex SQL Audit but..it's not free.
I hope this helps you,
F. O'Neill

Resources