Why does sql-server prevent inserting in WHEN MATCHED of merge? - sql-server

Anyone know why sql server prevents inserting from within the WHEN MATCHED clause of a MERGE statement? I understand that the documentation only allows updates or deletes, I'm wondering why this is the case so I can understand merge better.
Look at this post for an example.

If you are trying to merge your source to your target, it does not make sense to insert a line if it was found in the target. You may want to update or delete it though. Inserting what is already there would create duplicates.

As you want to INSERT when you find a MATCH, i presume the condition of the ON-clause is met but another field is different. Consider including this field into the ON-clause with AND to differentiate between present rows and to be inserted rows.

Common sense says: if you already have it there (the record) why would you want to insert it again? Not to mention that normally the "matching" is on a non duplicated key.
If you are able to find a situation where a matched record needs to be inserted again, let us know to help you.

I think the case might be if I want to keep track of history. e.g. Telephone field; I want to see what was the telephone before you change it.

Related

SQL query returns duplicate value after joining tables

I need help using SUM and GROUP BY in SQL Server.
I am generating the query based on 5 tables. I have tried in SQL Server.
Some parts of the query are working, but when I advance the query, I get the wrong results/data.
The problem is that the data is processed twice instead of once on every group by field, e.g. farmer_ID, where the farmer bears or has two or more records. 
This happens when i add more tables to the join - on one or two tables, the sum values are okay.
Hence I get farmer_sales = 200 instead of 100.
Kindly let me know how I can get some help
Thanks
David   
You can use the outer join (Left or Right) and choose the table that have only one record for each item
Another solution you can use Keyword Distinct before the column name
Nobody here will be able to help without table definitions, the requirements & the query.
With these issues I find it helpful to write the query so you get the proper rows you need. It sounds like you have either dodgy data or incomplete join conditions but it's not possible to tell that without the above. You can debug your data by finding a problem (e.g. farmer_sales) and working through the raw data & query from there. You will have an incomplete PK/FK relationship in your query or a missing constraint allowing bad data. Or you have misunderstood the requirement or the requirement does not makes sense to the data model.
Once you have the query working correctly you can add the aggregations.
One bit of general advice I can give is that adding DISTINCT is almost always the wrong approach.

Does every record has an unique field in SQL Server?

I'm working in Visual Studio - VB.NET.
My problem is that I want to delete a specific row in SQL Server but the only unique column I have is an Identity that increments automatically.
My process of work:
1. I add a row in the column (the identity is being incremented, but I don't know the number)
2. I want to delete the previous row
Is there a sort of unique ID that every new record has?
It's possible that my table has 2 exactly the same records, just the sequence (identity) is different.
Any ideas how to handle this problem?
SQL Server has a few functions that return the generated ID for the last rows, each with it's own specific strengths and weaknesses.
Basically:
##IDENTITY works if you do not use triggers
SCOPE_IDENTITY() works for the code you explicitly called.
IDENT_CURRENT(‘tablename’) works for a specific table, across all scopes.
In almost all scenarios SCOPE_IDENTITY() is what you need, and it's a good habit to use it, opposed to the other options.
A good discussion on the pros and cons of the approaches is also available here.
I want to delete the previous row
And that is your problem. There is no such concept in SQL as a 'previous row'. The word previous implies order and order applies only to queries, where is achieved by adding an ORDER BY clause. Tables have no order. You need to rephrase this in terms of "I need to delete the record that satisfies <this> condition.". This may sound to you like pedantic gibberish, but you will never find a solution until you acknowledged the problem.
Searching for a way to interpret the value of the inserted identity column and then subtracting 1 from it is flawed with many many many problems. It is incorrect under concurrency. It is incorrect in presence of rollbacks. It is incorrect after ETL jobs. Overall, never expect monotonically increasing identities, they're free to jump gaps and your code should be correct in presence of gaps.

When does it make sense to use ##IDENTITY and not SCOPE_IDENTITY?

According to MSDN, ##IDENTITY returns the last identity value generated for any table in the current session, across all scopes.
Has anyone come across a situation where this functionality was useful? I can't think of a situation when you would want the last ID generated for any table across all scopes or how you would use it.
UPDATE:
Not sure what all the downvotes are about, but I figured I'd try to clarify what I'm asking.
First off I know when to use SCOPE_IDENTITY and IDENT_CURRENT. What I'm wondering is when would it be better to use ##IDENTITY as opposed to these other options? I have yet to find a place to use it in my day to day work and I'm wondering if someone can describe a situation when it is the best option.
Most of the time when I see it, it is because someone doesn't understand what they were doing, but I assume Microsoft included it for a reason.
In general, it shouldn't be used. SCOPE_IDENTITY() is far safer to use (as long as we're talking about single-row inserts, as highlighted in a comment above), except in the following scenario, where ##IDENTITY is one approach that can be used (SCOPE_IDENTITY() cannot in this case):
your code knowingly fires a trigger
the trigger inserts into a table with an identity column
the calling code needs the identity value generated inside the trigger
the calling code can guarantee that the ##IDENTITY value generated within the trigger will never be changed to reflect a different table (e.g. someone adds logging to the trigger after the insert you were relying on)
This is an odd use case, but feasible.
Keep in mind this won't work if you are inserting multiple rows and need multiple IDENTITY values back. There are other ways, but they also require the option to allow result sets from cursors, and IIRC this option is being deprecated.
##IDENTITY is fantastic for identifying individual rows based on an ID column.
ID | Name | Age
1 AA 20
2 AB 30
etc...
In this case the ID column would be reliant on the ##IDENTITY property.

SQL Server 2008 - Check For Row Changes

Instead of using a ton of or statements to check if a row has been altered I was looking into checksum() or binary_checksum(). What is best practice for this situation? Is it using checksum(), binary_checksum() or some other method? I like the the idea of using one fo the checksum options so I don't have to build a massive or statement for my update.
EDIT:
Sorry everyone, I should have provided more detail. I need to pull in data from some outside sources, but because I am using merge replication I don't want to just blowout and rebuild the tables. I want to only update or insert the rows that really have changes or don't exist. I will have a paired down version of the source data in my target db that will get sync'd down to clients. I was trying to find a good way to detect the row changes without having to look at every single column to perform the update.
Any suggestions is greatly appreciated.
Thanks,
S
First, if you are using actual Merge replication, it should take care of updating the proper rows for you.
Second, typically the way to determine if a row has changed is to use a column with a data type of timestamp, now called rowversion, which changes each time the row updated. However, this type of column will only tell you if the value changed since the last time you read the value which means you have to have read and stored the timestamps to use in comparison. Thus, this may not work for you.
Lastly, a solution which may work for you would be triggers on the table in question that update an actual DateTime (or better yet, DateTime2) column with the current date and time when an insert takes place. Your comparison would need to store the datetime you last synchronized to the table and compare that datetime in the last updated column to determine which rows had changed.
It might help if we have a bit more info about what you are doing but in general the checksum() option does work well as long as you have access to the original checksum of the row to compare to.

Postgresql: keep 2 sequences synchronized

Is there a way to keep 2 sequences synchronized in Postgres?
I mean if I have:
table_A_id_seq = 1
table_B_id_seq = 1
if I execute SELECT nextval('table_A_id_seq'::regclass)
I want that table_B_id_seq takes the same value of table_A_id_seq
and obviously it must be the same on the other side.
I need 2 different sequences because I have to hack some constraints I have in Django (and that I cannot solve there).
The two tables must be related in some way? I would encapsulate that relationship in a lookup table containing the sequence and then replace the two tables you expect to be handling with views that use the lookup table.
Just use one sequence for both tables. You can't keep them in sync unless you always sync them again and over again. Sequences are not transaction safe, they always roll forwards, never backwards, not even by ROLLBACK.
Edit: one sequence is also not going to work, doesn't give you the same number for both tables. Use a subquery to get the correct number and use just a single sequence for a single table. The other table has to use the subquery.
My first thought when seeing this is why do you really want to do this? This smells a little spoiled, kinda like milk does after being a few days expired.
What is the scenario that requires that these two seq stay at the same value?
Ignoring the "this seems a bit odd" feelings I'm getting in my stomach you could try this:
Put a trigger on table_a that does this on insert.
--set b seq to the value of a.
select setval('table_b_seq',currval('table_a_seq'));
The problem with this approach is that is assumes only a insert into table_a will change the table_a_seq value and nothing else will be incrementing table_a_seq. If you can live with that this may work in a really hackish fashion that I wouldn't release to production if it was my call.
If you really need this, to make it more robust make a single interface to increment table_a_seq such as a function. And only allow manipulation of table_a_seq via this function. That way there is one interface to increment table_a_seq and you should also put
select setval('table_b_seq',currval('table_a_seq')); into that function. That way no matter what, table_b_seq will always be set to be equal to table_a_seq. That means removing any grants to the users to table_a_seq and only granting them execute grant on the new function.
You could put an INSERT trigger on Table_A that executes some code that increases Table_B's sequence. Now, every time you insert a new row into Table_A, it will fire off that trigger.

Resources