SQL Server 2008 - Check For Row Changes - sql-server

Instead of using a ton of or statements to check if a row has been altered I was looking into checksum() or binary_checksum(). What is best practice for this situation? Is it using checksum(), binary_checksum() or some other method? I like the the idea of using one fo the checksum options so I don't have to build a massive or statement for my update.
EDIT:
Sorry everyone, I should have provided more detail. I need to pull in data from some outside sources, but because I am using merge replication I don't want to just blowout and rebuild the tables. I want to only update or insert the rows that really have changes or don't exist. I will have a paired down version of the source data in my target db that will get sync'd down to clients. I was trying to find a good way to detect the row changes without having to look at every single column to perform the update.
Any suggestions is greatly appreciated.
Thanks,
S

First, if you are using actual Merge replication, it should take care of updating the proper rows for you.
Second, typically the way to determine if a row has changed is to use a column with a data type of timestamp, now called rowversion, which changes each time the row updated. However, this type of column will only tell you if the value changed since the last time you read the value which means you have to have read and stored the timestamps to use in comparison. Thus, this may not work for you.
Lastly, a solution which may work for you would be triggers on the table in question that update an actual DateTime (or better yet, DateTime2) column with the current date and time when an insert takes place. Your comparison would need to store the datetime you last synchronized to the table and compare that datetime in the last updated column to determine which rows had changed.

It might help if we have a bit more info about what you are doing but in general the checksum() option does work well as long as you have access to the original checksum of the row to compare to.

Related

SQL Server vector clock

Is there a global sequence number in SQL Server which guarantees to increment periodically (even when system time regresses), and can be accessed as part of an insert or update operation?
Yes the rowversion data type, and the ##dbts function are what you're looking for.
This pattern, of marking rows using a rowversion is implemented at a lower level by the Change Tracking feature. Which adds tracking of insert/updates and deletes, and doesn't require you to add a column to your table.
I'm pretty sure ROWVERSION does what you want. A ROWVERSION-typed column is guaranteed to be unique within any single database, and, per the SQL documentation, it is nothing more than an incrementing number. If you just save MAX(ROWVERSION) each time you've finished updated your data, you can find updated or inserted rows in your next update pass by looking fo0r ROWVERSIONs that are bigger than the saved MAX(). Note that you cannot catch deletes in this fashion!
Another approach is to use LineageId's and triggers. I'm happy to explain that approach if it would help, but I think ROWVERSION is a simpler solution.

Stored procedure to update different columns

I have an API that i'm trying to read that gives me just the updated field. I'm trying to take that and update my tables using a stored procedure. So far the only way I have been able to figure out how to do this is with dynamic SQL but i would prefer to not do that if there is a way not to.
If it was just a couple columns, I'd just write a proc for each but we are talking about 100 fields and any of them could be updated together. One ticket might just need a timestamp updated at this time, but the next ticket might be a timestamp and who modified it while the next one might just be a note.
Everything I've read and have been taught have told me that dynamic SQL is bad and while I'll write it if I have too, I'd prefer to have a proc.
YOU CAN PERHAPS DO SOMETHING LIKE THIS:::
IF EXISTS (SELECT * FROM NEWTABLE NOT IN (SELECT * FROM OLDTABLE))
BEGIN
UPDATE OLDTABLE
SET OLDTABLE.OLDRECORDS = NEWTABLE.NEWRECORDS
WHERE OLDTABLE.PRIMARYKEY= NEWTABLE.PRIMARYKEY
END
The best way to solve your problem is using MERGE:
Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
As you can see your update could be more complex but more efficient as well. Using MERGE requires some proficiency, but when you start to use it you'll use it with pleasure again and again.
I am not sure how your business logic works that determines what columns are updated at what time. If there are separate business functions that require updating different but consistent columns per function, you will probably want to have individual update statements for each function. This will ensure that each process updates only the columns that it needs to update.
On the other hand, if your API is such that you really don't know ahead of time what needs to be updated, then building a dynamic SQL query is a good idea.
Another option is to build a save proc that sets every user-configurable field. As long as the calling process has all of that data, it can call the save procedure and pass every updateable column. There is no harm in having a UPDATE MyTable SET MyCol = #MyCol with the same values on each side.
Note that even if all of the values are the same, the rowversion (or timestampcolumns) will still be updated, if present.
With our software, the tables that users can edit have a widely varying range of columns. We chose to create a single save procedure for each table that has all of the update-able columns as parameters. The calling processes (our web servers) have all the required columns in memory. They pass all of the columns on every call. This performs fine for our purposes.

Way to persist function result as a constant

I needed to create a function today which will always return the exact same value on the specific database it's executed on. It may / may not be the same across databases which is why it has to be able to load it from a table the first time it's required.
CREATE FUNCTION [dbo].[PAGECODEGET] ()
RETURNS nvarchar(6)
AS
BEGIN
DECLARE #PageCode nvarchar(6) = ( SELECT PCO_IDENTITY FROM PAGECODES WHERE PCO_PAGE = 'SWTE' AND PCO_TAB = 'RECORD' )
RETURN #PageCode
END
The PCO_IDENTITY field is a sql identity field, so once the record is inserted for the first time, it's always going to return the same result thereafter.
My question is, is there any way to persist this value to something equivalent to a C# readonly variable?
From a perfomance point of view I know sql will optimise the plan etc, but from a best practice point of view I'm thinking there may possibly be a better way of doing it.
We use a mix of SQL Servers, but the lowest is 2008 R2 in case there's a version specific solution.
I'm afraid there's no such thing as a global variable like you suggest in SQL Server.
As you've pointed out, the function will potentially return different results on another database, depending on a variety of factors, such as when the row was inserted, what other values exist in the table already etc. - basically, the PCO_IDENTITY value for this row cannot be relied upon to be consistent.
A few observations:
I don't see how getting this value occasionally is really going to be a performance bottleneck. I don't think best practices cover this, as selecting a value from a table is as basic as you can get.
If this is part of another larger query, you will probably get better performance by using a join to the PAGECODES table directly, rather than potentially running this function for every row
However, if you are really worried:
There are objects in the database which are persistant - tables. When you first insert this value, retrieve the PCO_IDENTITY value, and create a new table with just that in, that you may join to in your queries. Seems a bit of a waste for one value, doesn't it? (Note you could also make a view, but how would that be any better performing than the function you started with?)
You could force these values into a row with a specific PCO_IDENTITY value, using IDENTITY_INSERT. That way the value is consistent, and you know what it is - you could hard code it in your queries. (NB: Turn IDENTITY_INSERT off again afterwards, and other rows inserted into this table will continue to be automatically generated again)
TL;DR: How you are doing it is probably fine. I suspect you are trying to optimise something that isn't a problem. As always - if in doubt, try out a few approaches and measure.

Difference between MIN(__$start_lsn) and fn_cdc_get_min_lsn?

Using CDC on SQL Server 2012.
I have a table (MyTable) which is CDC enabled. I thought the following two queries would always return the same value:
SELECT MIN(__$start_lsn) FROM cdc.dbo_MyTable_CT;
SELECT sys.fn_cdc_get_min_lsn('dbo_MyTable');
But they don't seem to do so: in my case the first one returns 0x00001EC6000000DC0003 and the second one 0x00001E31000000750001, so the absolute minimum in the table is actually greater than the value returned by fn_cdc_get_min_lsn.
My questions:
Why are the results different?
Is there any problem with using the value from the first query as the first parameter on fn_cdc_get_all_changes_dbo_MyTable? (all examples I've seen use the value from the second query)
My understanding is that the first one returns the oldest LSN for the data that's currently in the CDC table and the latter reflects when the table was added to CDC. I will note though that you'll only want to use the minimum (whichever method you go with) once so you don't process duplicate records. Also, since the second method gets its result from sys.cdc_tables (which very likely has far fewer rows than your CDC table does), it's going to be more efficient.
sys.fn_cdc_get_min_lsn returns the minimum available lsn for a change captured table.
Like #Ben says, this can be different (earlier) from the earliest change actually captured, for example when a table is first added to CDC and there haven't been any changes yet.
As per the MSDN doco you should always use this to validate your query ranges prior to execution because change data will eventually get cleaned up. So you will not only use this once - you will check it every time.
You should use this rather than getting the min LSN other ways because
it'll be faster (as Ben pointed out). Much faster potentially.
it's the documented API for doing so. The implementation of the backing tables might change in future versions etc...
Workflow is generally:
load your previous LSN from (your state)
query current LSN
query minimum available for the table
if prev > min available load changes only
otherwise load whole table and handle it (somehow)
save current LSN to (your state)

Tracking a Change on a Column

I recently ran across a very interesting problem involving database design.
I have changed the table names to simplify the problem, so let me describe it as such:
I have 2 tables, fruit and vegetable each stores whether or not a fruit or vegetable is tasty.
Now lets say that someone keeps changing the IsTasty setting through the UI of my application and management is demanding the ability to see when someone changed it last and who. The tricky part here is although we are ignoring the other data on the table, there is other data, and we don’t want to track when any data on the table was changed, just this one column.
What is the best way to solve this problem?
I have a description of the problem with ER diagrams here:
I like the way the acts_as_versioned plugin does it in Rails. It's closest to your solution 2 with an additional version number field. You basically have your fruits table and your fruits_versions table. Every time a row in fruits is updated, you insert the same data in the fruits_versions table, with an incremented version number.
I think it's more extensible than you solution 3 approach if you ever want to add more fields to the tables or track additional values. Solution 4 is sort of a non-relational solution, you could probably keep an audit log like that.
Another approach, as opposed to keeping the versions, is to keep track of the changes, like subversion or a version control system does. This can make it easier if you often need to know if something changed from a to b, versus just what it changed to. I guess that means the real answer is "it depends" on how the data will be used.
If you are using SQL Server 2008, have you taken a look at CDC (Change Data Capture)?
In a Trigger, there is a way to see which columns were modified..
Using SQL Server 2005 and up you can create a trigger with an if statement like this:
IF UPDATE(IsTasty)
BEGIN
Insert INTO Log (ID, NewValue) VALUES (#ID, #NewValue)
END
One way to do this is to add triggers to those tables. In the triggers check to see if the column you are interested has changed. If it has changed, insert a row into another table that tracks changes. The change tracking table might want to store data like the name of the column that was changed, the previous value, the new value and the date of the change.
Fo SQL Server, I use AutoAudit to generate the triggers. The audit table contains the history and views can be used to show the changes (AutoAudit makes views for deleted rows automatically).

Resources