Difference between MIN(__$start_lsn) and fn_cdc_get_min_lsn? - sql-server

Using CDC on SQL Server 2012.
I have a table (MyTable) which is CDC enabled. I thought the following two queries would always return the same value:
SELECT MIN(__$start_lsn) FROM cdc.dbo_MyTable_CT;
SELECT sys.fn_cdc_get_min_lsn('dbo_MyTable');
But they don't seem to do so: in my case the first one returns 0x00001EC6000000DC0003 and the second one 0x00001E31000000750001, so the absolute minimum in the table is actually greater than the value returned by fn_cdc_get_min_lsn.
My questions:
Why are the results different?
Is there any problem with using the value from the first query as the first parameter on fn_cdc_get_all_changes_dbo_MyTable? (all examples I've seen use the value from the second query)

My understanding is that the first one returns the oldest LSN for the data that's currently in the CDC table and the latter reflects when the table was added to CDC. I will note though that you'll only want to use the minimum (whichever method you go with) once so you don't process duplicate records. Also, since the second method gets its result from sys.cdc_tables (which very likely has far fewer rows than your CDC table does), it's going to be more efficient.

sys.fn_cdc_get_min_lsn returns the minimum available lsn for a change captured table.
Like #Ben says, this can be different (earlier) from the earliest change actually captured, for example when a table is first added to CDC and there haven't been any changes yet.
As per the MSDN doco you should always use this to validate your query ranges prior to execution because change data will eventually get cleaned up. So you will not only use this once - you will check it every time.
You should use this rather than getting the min LSN other ways because
it'll be faster (as Ben pointed out). Much faster potentially.
it's the documented API for doing so. The implementation of the backing tables might change in future versions etc...
Workflow is generally:
load your previous LSN from (your state)
query current LSN
query minimum available for the table
if prev > min available load changes only
otherwise load whole table and handle it (somehow)
save current LSN to (your state)

Related

database row is changing every time I update the row

So I am using postgres type database and I have a function that updates rows in the database for some reason every time I change something it "pushes" the row to the end of the table rather than staying in the same position of where it was.
this is an example of me updating the data (this is a part of the function):
users.query.filter_by(username = user).update(dict(computer_id = assign_value, level=level))
db.session.commit()
but for some reason whenever I see the users table I can see that whatever value I updated is getting pushed to the end of the row
There is no such thing as an ordering on the records of a table. Internally, updating a record is handled as inserting a newer version and at some time delete the older version (if the trasaction completes, the older version should not be needed again, at least not for newer transactions). From this point of view, it even makes some sense that the record is "moved" to the end of the table (eventhough the table does not have any start or end).
If you want to have a certain ordering, consider querying the data with an appropriate ORDER BY (or whatever function or options your framework uses to do this). If you query data and you do not specify an ordering, the retrieved records may be shuffled in any way. Do never rely on things like "If I only insert in this table, the data will always be returned in the same sequence as I inserted it" (eventhough this might be true under some circumstances).

SQL Server vector clock

Is there a global sequence number in SQL Server which guarantees to increment periodically (even when system time regresses), and can be accessed as part of an insert or update operation?
Yes the rowversion data type, and the ##dbts function are what you're looking for.
This pattern, of marking rows using a rowversion is implemented at a lower level by the Change Tracking feature. Which adds tracking of insert/updates and deletes, and doesn't require you to add a column to your table.
I'm pretty sure ROWVERSION does what you want. A ROWVERSION-typed column is guaranteed to be unique within any single database, and, per the SQL documentation, it is nothing more than an incrementing number. If you just save MAX(ROWVERSION) each time you've finished updated your data, you can find updated or inserted rows in your next update pass by looking fo0r ROWVERSIONs that are bigger than the saved MAX(). Note that you cannot catch deletes in this fashion!
Another approach is to use LineageId's and triggers. I'm happy to explain that approach if it would help, but I think ROWVERSION is a simpler solution.

Does Oracle (RDB in general?) take a snapshot of the table affected by DML?

Objective
To understand the mechanism/implementation when processing DMLs against a table. Does a database (I work on Oracle 11G R2) take snapshots (for each DML) of the table to apply the DMLs?
Background
I run a SQL to update the AID field of the target table containing old values with the new values from the source table.
UPDATE CASES1 t
SET t.AID=(
SELECT DISTINCT NID
FROM REF1
WHERE
oid=t.aid
)
WHERE EXISTS (
SELECT 1
FROM REF1
WHERE oid=t.aid
);
I thought the 'OLD01' could be updated twice (OLD01 -> NEW01 -> SCREWED).
However, it did not happen.
Question
For each DML, does a database take a snapshot of table X (call it X+1) for a DML (1st) and then keep taking snapshot (call it X+2) of the result (X+1) for the next DML (2nd) on the table, and so on for each DML that are successibly executed? Is this also used as a mechanism to implement Rollback/Commit?
Is it an expected behaviour specified as a standard somewhere? If so, kindly suggest relevant references. I Googled but not sure what the key words should be to get the right result.
Thanks in advance for your help.
Update
Started reading Oracle Core (ISBN 9781430239543) by Jonathan Lewis and saw the diagram. So current understanding is the UNDO records are created in the UNDO tablespace for each update and the original data is reconstructed from there, which I initially thought as snapshots.
In Oracle, if you ran that update twice in a row in the same session, with the data as you've shown, I believe you should get the results that you expected. I think you must have gone off track somewhere. (For example, if you executed the update once, then without committing you opened a second session and executed the same update again, then your result would make sense.)
Conceptually, I think the answer to your question is yes (speaking specifically about Oracle, that is). A SQL statement effectively operates on a snapshot of the tables as of the point in time that the statement starts executing. The proper term for this in Oracle is read-consistency. The mechanism for it, however, does not involve taking a snapshot of the entire table before changes are made. It is more the reverse - records of the changes are kept in undo segments, and used to revert blocks of the table to the appropriate point in time as needed.
The documentation you ought to look at to understand this in some depth is in the Oracle Concepts manual: http://docs.oracle.com/cd/E11882_01/server.112/e40540/part_txn.htm#CHDJIGBH

Count number of times a procedure is executed

Requirement:
To count the number of times a procedure has executed
From what I understand so far, sys.dm_exec_procedure_stats can be used for approximate count but that's only since the last service restart. I found this link on this website relevant but I need count to be precise and that should not flush off after the service restart.
Can I have some pointers on this, please?
Hack: The procedure I need to keep track of has a select statement so returns some rows that are stored in a permanent table called Results. The simplest solution I can think of is to create a column in Results table to keep track of the procedure execution, select the maximum value from this column before the insert and add one to it to increment the count. This solution seems quite stupid to me as well but the best I could think of.
What I thought is you could create a sequence object, assuming you're on SQL Server 2012 or newer.
CREATE SEQUENCE ProcXXXCounter
AS int
START WITH 1
INCREMENT BY 1 ;
And then in the procedure fetch a value from it:
declare #CallCount int
select #CallCount = NEXT VALUE FOR ProcXXXCounter
There is of course a small overhead with this, but doesn't cause similar blocking issue that could happen with using a table because sequences are handled outside transaction.
Sequence parameters: https://msdn.microsoft.com/en-us/library/ff878091.aspx
The only way I can think of keeping track of number of executions even when the service has restarted , is to have a table in your database and insert a row to that table inside your procedure everytime it is executed.
Maybe add a datetime column as well to collect more info about the execution. And a column for user who executed etc..
This can be done, easily and without Enterprise Edition, by using extended events. The sqlserver.module_end event will fire, set your predicates correctly and use a histogram target.
http://sqlperformance.com/2014/06/extended-events/predicate-order-matters
https://technet.microsoft.com/en-us/library/ff878023(v=sql.110).aspx
To consume the value, query the histogram target (under the reviewing target output examples).

SQL Server 2008 - Check For Row Changes

Instead of using a ton of or statements to check if a row has been altered I was looking into checksum() or binary_checksum(). What is best practice for this situation? Is it using checksum(), binary_checksum() or some other method? I like the the idea of using one fo the checksum options so I don't have to build a massive or statement for my update.
EDIT:
Sorry everyone, I should have provided more detail. I need to pull in data from some outside sources, but because I am using merge replication I don't want to just blowout and rebuild the tables. I want to only update or insert the rows that really have changes or don't exist. I will have a paired down version of the source data in my target db that will get sync'd down to clients. I was trying to find a good way to detect the row changes without having to look at every single column to perform the update.
Any suggestions is greatly appreciated.
Thanks,
S
First, if you are using actual Merge replication, it should take care of updating the proper rows for you.
Second, typically the way to determine if a row has changed is to use a column with a data type of timestamp, now called rowversion, which changes each time the row updated. However, this type of column will only tell you if the value changed since the last time you read the value which means you have to have read and stored the timestamps to use in comparison. Thus, this may not work for you.
Lastly, a solution which may work for you would be triggers on the table in question that update an actual DateTime (or better yet, DateTime2) column with the current date and time when an insert takes place. Your comparison would need to store the datetime you last synchronized to the table and compare that datetime in the last updated column to determine which rows had changed.
It might help if we have a bit more info about what you are doing but in general the checksum() option does work well as long as you have access to the original checksum of the row to compare to.

Resources