When does it make sense to use ##IDENTITY and not SCOPE_IDENTITY? - sql-server

According to MSDN, ##IDENTITY returns the last identity value generated for any table in the current session, across all scopes.
Has anyone come across a situation where this functionality was useful? I can't think of a situation when you would want the last ID generated for any table across all scopes or how you would use it.
UPDATE:
Not sure what all the downvotes are about, but I figured I'd try to clarify what I'm asking.
First off I know when to use SCOPE_IDENTITY and IDENT_CURRENT. What I'm wondering is when would it be better to use ##IDENTITY as opposed to these other options? I have yet to find a place to use it in my day to day work and I'm wondering if someone can describe a situation when it is the best option.
Most of the time when I see it, it is because someone doesn't understand what they were doing, but I assume Microsoft included it for a reason.

In general, it shouldn't be used. SCOPE_IDENTITY() is far safer to use (as long as we're talking about single-row inserts, as highlighted in a comment above), except in the following scenario, where ##IDENTITY is one approach that can be used (SCOPE_IDENTITY() cannot in this case):
your code knowingly fires a trigger
the trigger inserts into a table with an identity column
the calling code needs the identity value generated inside the trigger
the calling code can guarantee that the ##IDENTITY value generated within the trigger will never be changed to reflect a different table (e.g. someone adds logging to the trigger after the insert you were relying on)
This is an odd use case, but feasible.
Keep in mind this won't work if you are inserting multiple rows and need multiple IDENTITY values back. There are other ways, but they also require the option to allow result sets from cursors, and IIRC this option is being deprecated.

##IDENTITY is fantastic for identifying individual rows based on an ID column.
ID | Name | Age
1 AA 20
2 AB 30
etc...
In this case the ID column would be reliant on the ##IDENTITY property.

Related

Way to persist function result as a constant

I needed to create a function today which will always return the exact same value on the specific database it's executed on. It may / may not be the same across databases which is why it has to be able to load it from a table the first time it's required.
CREATE FUNCTION [dbo].[PAGECODEGET] ()
RETURNS nvarchar(6)
AS
BEGIN
DECLARE #PageCode nvarchar(6) = ( SELECT PCO_IDENTITY FROM PAGECODES WHERE PCO_PAGE = 'SWTE' AND PCO_TAB = 'RECORD' )
RETURN #PageCode
END
The PCO_IDENTITY field is a sql identity field, so once the record is inserted for the first time, it's always going to return the same result thereafter.
My question is, is there any way to persist this value to something equivalent to a C# readonly variable?
From a perfomance point of view I know sql will optimise the plan etc, but from a best practice point of view I'm thinking there may possibly be a better way of doing it.
We use a mix of SQL Servers, but the lowest is 2008 R2 in case there's a version specific solution.
I'm afraid there's no such thing as a global variable like you suggest in SQL Server.
As you've pointed out, the function will potentially return different results on another database, depending on a variety of factors, such as when the row was inserted, what other values exist in the table already etc. - basically, the PCO_IDENTITY value for this row cannot be relied upon to be consistent.
A few observations:
I don't see how getting this value occasionally is really going to be a performance bottleneck. I don't think best practices cover this, as selecting a value from a table is as basic as you can get.
If this is part of another larger query, you will probably get better performance by using a join to the PAGECODES table directly, rather than potentially running this function for every row
However, if you are really worried:
There are objects in the database which are persistant - tables. When you first insert this value, retrieve the PCO_IDENTITY value, and create a new table with just that in, that you may join to in your queries. Seems a bit of a waste for one value, doesn't it? (Note you could also make a view, but how would that be any better performing than the function you started with?)
You could force these values into a row with a specific PCO_IDENTITY value, using IDENTITY_INSERT. That way the value is consistent, and you know what it is - you could hard code it in your queries. (NB: Turn IDENTITY_INSERT off again afterwards, and other rows inserted into this table will continue to be automatically generated again)
TL;DR: How you are doing it is probably fine. I suspect you are trying to optimise something that isn't a problem. As always - if in doubt, try out a few approaches and measure.

Does every record has an unique field in SQL Server?

I'm working in Visual Studio - VB.NET.
My problem is that I want to delete a specific row in SQL Server but the only unique column I have is an Identity that increments automatically.
My process of work:
1. I add a row in the column (the identity is being incremented, but I don't know the number)
2. I want to delete the previous row
Is there a sort of unique ID that every new record has?
It's possible that my table has 2 exactly the same records, just the sequence (identity) is different.
Any ideas how to handle this problem?
SQL Server has a few functions that return the generated ID for the last rows, each with it's own specific strengths and weaknesses.
Basically:
##IDENTITY works if you do not use triggers
SCOPE_IDENTITY() works for the code you explicitly called.
IDENT_CURRENT(‘tablename’) works for a specific table, across all scopes.
In almost all scenarios SCOPE_IDENTITY() is what you need, and it's a good habit to use it, opposed to the other options.
A good discussion on the pros and cons of the approaches is also available here.
I want to delete the previous row
And that is your problem. There is no such concept in SQL as a 'previous row'. The word previous implies order and order applies only to queries, where is achieved by adding an ORDER BY clause. Tables have no order. You need to rephrase this in terms of "I need to delete the record that satisfies <this> condition.". This may sound to you like pedantic gibberish, but you will never find a solution until you acknowledged the problem.
Searching for a way to interpret the value of the inserted identity column and then subtracting 1 from it is flawed with many many many problems. It is incorrect under concurrency. It is incorrect in presence of rollbacks. It is incorrect after ETL jobs. Overall, never expect monotonically increasing identities, they're free to jump gaps and your code should be correct in presence of gaps.

Postgresql: keep 2 sequences synchronized

Is there a way to keep 2 sequences synchronized in Postgres?
I mean if I have:
table_A_id_seq = 1
table_B_id_seq = 1
if I execute SELECT nextval('table_A_id_seq'::regclass)
I want that table_B_id_seq takes the same value of table_A_id_seq
and obviously it must be the same on the other side.
I need 2 different sequences because I have to hack some constraints I have in Django (and that I cannot solve there).
The two tables must be related in some way? I would encapsulate that relationship in a lookup table containing the sequence and then replace the two tables you expect to be handling with views that use the lookup table.
Just use one sequence for both tables. You can't keep them in sync unless you always sync them again and over again. Sequences are not transaction safe, they always roll forwards, never backwards, not even by ROLLBACK.
Edit: one sequence is also not going to work, doesn't give you the same number for both tables. Use a subquery to get the correct number and use just a single sequence for a single table. The other table has to use the subquery.
My first thought when seeing this is why do you really want to do this? This smells a little spoiled, kinda like milk does after being a few days expired.
What is the scenario that requires that these two seq stay at the same value?
Ignoring the "this seems a bit odd" feelings I'm getting in my stomach you could try this:
Put a trigger on table_a that does this on insert.
--set b seq to the value of a.
select setval('table_b_seq',currval('table_a_seq'));
The problem with this approach is that is assumes only a insert into table_a will change the table_a_seq value and nothing else will be incrementing table_a_seq. If you can live with that this may work in a really hackish fashion that I wouldn't release to production if it was my call.
If you really need this, to make it more robust make a single interface to increment table_a_seq such as a function. And only allow manipulation of table_a_seq via this function. That way there is one interface to increment table_a_seq and you should also put
select setval('table_b_seq',currval('table_a_seq')); into that function. That way no matter what, table_b_seq will always be set to be equal to table_a_seq. That means removing any grants to the users to table_a_seq and only granting them execute grant on the new function.
You could put an INSERT trigger on Table_A that executes some code that increases Table_B's sequence. Now, every time you insert a new row into Table_A, it will fire off that trigger.

What should be returned when inserting into SQL?

A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?
We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.
The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.
Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)
Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI
I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.

Reason for using ##identity rather than scope_identity

On a SQL Server 2005 database, one of our remote developers just checked in a change to a stored procedure that changed a "select scope_identity" to "select ##identity". Do you know of any reasons why you'd use ##identity over scope_identity?
##IDENTITY will return the last identity value issued by the current session. SCOPE_IDENTITY() returns the last identity value in the current session and same scope. They are usually the same, but assume a trigger is called which inserted something somewhere just before the current statement. ##IDENTITY will return the identity value by the INSERT statement of the trigger, not the insert statement of the block. It's usually a mistake unless he knows what he's doing.
Here is a link that may help differentiate them
looks like:
IDENTITY - last identity on the connection
SCOPE_IDENTITY - last identity you explicitly created (excludes triggers)
IDENT_CURRENT(’tablename’) - Last Identity in table regardless of scope or connection.
I can't think of any, unless there was a trigger then inserted a row (or somesuch) and I really really wanted the id of that trigger-inserted row rather than the row I physically changed.
In other words, no, not really.
DISCLAIMER: Not a T-SQL expert :)
Maybe you should ask the developer their rationale behind making the change.
If you wanted the trigger use you could get another trigger added on is the only reason I can come up with. Even then it's dangerous as another trigger could be added and again you would get the wrong identity. I suspect the developer doesn't know what he is doing. But honestly the best thing to do is to ask him why he made the change. You could change it back, but the developer needs to know not to do that again unless he needs the trigger identity as you may not catch it the next time.

Resources