Environment:
SQL Server 2005 SP2 (9.0.3077)
Transactional Publications (Production and Beta)
I have a situation where I have two different Replication Publications setup that use some of the same Articles. Each of these Publications feeds a subscriber on a different machine. One of these shared Articles is a table. At a regular time interval many of the records in this table become aged and no longer needed. At this time a stored procedure that deletes records is called.
To save on resources and improve latency times to the subscribers I have set the replicate property on this stored procedure to “Execution of the stored procedure” instead of the default “Stored procedure definition only”. This way when the stored procedure deletes 2,000,000+ records these don’t replicate down to the subscribers. Instead the execution of the stored procedure is replicated and the same replicated stored procedure on the subscribers is executed and it deletes the same 2,000,000+ rows.
The problem I’m having is with my second publication. I didn’t need this type of behavior so I left the article property on the stored procedure set to “Stored procedure definition only” and was expecting replication to remove the rows at the other subscriber but it wasn’t. The table at the subscriber just kept gaining records. So to fix it I set the Article Property to "Execution..." and called it good. Which is probably the best solution so beta matches production, but it still feels like a kludge as the publication properties should work independently of each other.
Question: Why does the “Execution of the stored procedure” article property take precedence and get applied to the other publication even though it is set to “Stored procedure definition only” in the other publication?
We use replication extensively in our company as we have 38 warehouses in several countries all replicating back to our primary server in London.
Firstly, your replication filters should use Views, even the simple ones. That way, if you need to adjust the filter (read WHERE clause), you just need to alter the view and your done. Otherwise you have to re-publish your data, and re-subscribe everyone which can be a real pain.
You mentioned that you run the same delete on both subscriber and publisher to keep them in-sync. This sends shivers down my spine. Your far better off deleting them in one place and letting the server replicate out to the subscribers the changes made. Since SQL Server 2005, replication is very fast and efficient now. SQL 2000 was and is quite slow for replication. If your using SQL 2005/2008, just make sure your compatibility level (right click on db, properties, options) is set to 90 (2005) or 100 (2008). This switches sql server over to the fast and efficient replication methods.
Another way is to not delete the data, but to keep it and filter it out using a where clause in the publication.
It has been a long time since I actively administered replication but I suspect the answer has to do with the architecture of the log-reader and that you are sharing an article between publications. My understanding is that the log-reader will trawl through the log and look for operations on items that are replicated. Depending on the article settings, the individual changes to the data may be posted to a table in the distribution database or a record of the procedure invocation will be posted. In any case, this is a property of the article and not the publication(s) that the article is a member of. I assume (but have not tested and verified) that you can create multiple articles on top of the same database object and have one be replicated with #type='logbased' and the other with #type='proc exec'
Take all of this with a large pinch of salt: although I now develop on SQL 2008, the last time I did anything with replication was SQL 7.
pjjH
Related
I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.
we have a Database Replication set up where we Replicate all Tables of a Database to multiple Production Servers.
There are also Views, Stored Procedures, Functions, etc. in the Database which are manually deployed to the Replicates through TSQL Scripts.
Now if for example a new table is added to the Publication, we have to reinitialize all Subscriptions by creating a new snapshot and let it deliver through the Distributor (which is on the same server as the Publication). The Headache starts when the Distribution Agent wants to drop the Table to recreate it afterwards, some Tables are been referenced by views, which have also been referenced by another objects. the Distributor cannot (or will not) drop the objects and runs into an error like Cannot DROP TABLE 'dbo.table' because it is being referenced by object 'thisisafunctionorview'.
In the past we also had the Views, Functions and StoredProcedures in the Publication but that caused even more Pain (Reinitialisation had to be done after each minor change on procedures, etc.), also then the reference problems where really frustrating.
To resolve this issue we have to drop all functions and views (in total about 200 Objects) and recreate them after the snapshot had been delivered.
Does someone have an idea how we could modify the concept of this replication that we could change objects and not having to set a massive downtime (about 2h for 6 Replicates) to fix the mess because of the references?
To complete the Information:
We use MS SQL Server 2008 R2 on all the instances (with Enterprise and Standard Editions). Upgrade to SQL Server 2014 is planned later this year for the Publisher and some of the Subscribers.
Only the Publication demands writing access.
Updates to the schema of the Database are deployed frequently (about twice a month) usually there are only changes in Procedures but sometimes there were tables added / modified, thats where our replication concept seems to falls apart.
Any suggestions are welcome
Thanks in advance!
Sincerely
David
It sounds strange to me because I have a lot of stored procedures in replication and there is no problems with SP changes. ALTER PROCEDURE can be propagated on subscriptions. Also I see no problems with subscriptions re-initialization because of objects dependencies. I can remember such problems in merge replication though, and there is SP to re-arrange objects. In most cases SQL Server handles dependencies well. Second note, you can add\remove articles to transactional replication without re-initialization.
I think if you re-create replication on test environment and play with it a little bit, you'll find a way how to replicate schema changes and don't do re-initialization too often. It is possible but requires some efforts.
Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".
How can I record all the Inserts and Updates being performed on a database (MS SQL Server 2005 and above)?
Basically I want a table in which I can record all the inserts andupdates issues on my database.
Triggers will be tough to manage because there are 100s of tables and growing.
Thanks
Bullish
We have hundreds of tables and growing and use triggers. In newer versions of SQL server you can use change Data Capture or Change Tracking but we have not found them adequate for auditing.
What we have is are two separate audit tables for each table (one for recording the details of the instance (1 row even if you updated a million records) and one for recording the actual old and new values), but each has the same structure and is created by running a dynamic SQL proc that looks for unauditied tables and creates the audit triggers. This proc is run every time we deploy.
Then you should also take the time to write a proc to pull the data back out of the audit tables if you want to restore the old values. This can be tricky to write on the fly with this structure, so it is best to have it handy before you have the CEO peering down your neck while you restore the 50,000 users accidentally deleted.
As of SQL Server 2008 and above you have change data capture.
Triggers, although unwieldy and a maintenance nightmare, will do the job on versions prior to 2008.
Anyone know how the underlying replication model in sql server works? Do they essentially depend on UTC datetime values to determine if something is new or do they keep a table of all the changes (like a table of tableID+rowid that have changed).
I am building my own "replication" system and was planning on using the dates to know what to replicate. Then I started wondering what would happen if the date got off in the computer for some reason. The obvious choice is to keep a log of the changes as you go and once you replicate those changes, you remove from the log of changes. But thats a lot of extra work, instead of just checking dates.
I figure if sql server replication works by just checking the dates, then that should be good enough for me.
Any wisdom here?
thanks
As a transaction occurs in SQL Server, it is written to the transaction log along with information pertinent to the transaction.
SQL Server replication uses this transaction log to determine which transactions have not yet been processed and to move them to the subscriber. There is a lot more going on under the hood to keep track of the intersection between transactions, publications, subscriptions, etc. but I will leave that to MSDN documentation about SQL Server replication http://msdn.microsoft.com/en-us/library/ms151198.aspx
Moving on to your point about building your own replication system:
Do not build your own replication system. There are too many complications involved that will cause you to spend many many days working. You will be much better off using the items that are shipped with SQL Server.
SQL Server replication methods are pretty impressive out of the box.
If you outline what causes you to think in terms of building your own replication system, we can help you figure out how to use existing items to provision what you need.
Also, read up as much as you can here to get an idea of what it can do for you http://msdn.microsoft.com/en-us/library/ms151198.aspx
SQL Server has a LogReader job that is aptly named. Replication reads the transaction log and applies appropriate transactions to the subscribing databases.
For one thing SQLServer (and it's not the only one) supports multiple replication algorithms.
You can find here details about the ones implemented in SQLServer 2008. Read first the X Replication Overview then follow the How X Replication works for more details.