cdc when source system is sql server azure/standard 2008+ - sql-server

Let us say my target staging db/data warehouse is sql server 2008+ enterprise. However, my source systems are sql server azure/standard 2008+. Can I still exploit CDC? As far as I understand, I cannot as I have to turn CDC on in the source systems and it is only available for eneterprise editions. Is this correct? I am also curious what happens if the transaction log is truncated. Thanks.

I just googled it and... if you need this for replicating into a data warehouse you probably only need change tracking https://technet.microsoft.com/en-us/library/cc280519(v=sql.105).aspx. This http://azure.microsoft.com/en-us/documentation/articles/sql-database-preview-whats-new/ says change tracking is available in Azure.
I don't see any specific info anywhere about whether change tracking uses the transaction log, but this info is in one of the links:
The tracking mechanism in change data capture involves an asynchronous
capture of changes from the transaction log so that changes are
available after the DML operation. In change tracking, the tracking
mechanism involves synchronous tracking of changes in line with DML
operations so that change information is available immediately.

Related

Detect Table Changes In A Database Without Modifications

I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.

Will appending records to a table using SSIS in Microsoft SQL Server Studio cause any downtime?

I have been tasked with performing a merge of data from one database to another. Both databases will be located on the same server. I am using SSIS within Microsoft SQL Server Management Studio to perform this transfer of records. My question is, when I am performing this merge, will this cause any downtime for applications that rely on the database that the records are being transferred to? If I have not provided enough information for a reliable answer, please feel free to ask for further clarification.
Thank you!
-Dave
By default, SSIS uses the serializable transaction isolation setting. (more detail -http://msdn.microsoft.com/en-us/library/ms173763.aspx)
That setting will acquire locks as specified in the above article, and may indeed cause locking issues with applications that use the tables accessed by the package.
The duration of the locks and whether this is a problem in your environment are best determined via testing.

Confirming data in a SQL Server 2008 mirror

I have a SQL Server 2008 database set up for mirroring and was wondering if there was any way to generate a report for an audit showing that the data is being mirrored correctly and failing over would not result in any data loss. I can show using the database mirroring monitor that data is being transferred, but need a way to verify that the data matches (preferably without having to break the mirror).
Just query sys.database_mirroring, if the mirroring_state_desc is 'SYNCHRONIZED' then data is in the mirror. Make sure the transaction safety ('mirroring_safety_level') is FULL to guarantee no data loss on failover, see Mirroring states:
If transaction safety is set to FULL automatic failover and manual failover are both supported in the SYNCHRONIZED state, there is no data loss after a failover.
If transaction safety is off, some data loss is always possible, even in the SYNCHRONIZED state.
If the auditors don't trust the official product documentation, you can show the data content of a database snapshot of a mirror, since mirrors are not accessible. See Database Snapshots. Obviously, to do a meaningful comparison with a frozen snapshot you would have to freeze the source first, take the snapshot on mirror, run the comparison, then unfreeze the source. Which implies the database is read-only for the duration, any change will cause it to diverge from the snapshot and fail the comparison. An exercise in futility, with downtime, as the documentation clearly states that a syncronized full protected mirror is guaranteed to be identical with the source.

Using SAVE TRANSACTION with a linked server

Inside a transaction that have a savepoint I have to make a join with a table that is in a linked server. When I try to do it, I get the error message:
“Cannot use SAVE TRANSACTION within a distributed transaction”
The remote table data rarely changes. It is almost fixed. Is is possible to tell SqlServer to exclude this table from the transaction? I've tried a (NOLOCK) hint, but it isn't possible to use this hint for a table in a linked server.
Does anyone knows about a workaround? I'm using the ole SqlServer 2000.
One thing that you could do is to make a local copy of the remote table before you start the transaction. I know that this may sound like a lot of overhead, but remote joins are frequently a performance problem anyway and the SOP fix for that is also to make a local copy.
According to this link, the ability to use SAVEPOINTs in a Distributed transaction was dropped in SQL 7.
To allow application migration from Microsoft SQL Server 6.5 when
savepoints inside distributed transactions are in use, Microsoft SQL
Server 2000 Service Pack 1 introduces a trace flag that allows a
savepoint within a distributed transaction. The trace flag is 8599 and
can be turned on during the SQL Server startup or within an individual
session (that is, prior to enabling a distributed transaction with a
BEGIN DISTRIBUTED TRANSACTION statement) by using the DBCC TRACEON
command. When trace flag 8599 is set to ON, SQL Server allows you to
use a savepoint within a distributed transaction.
So unfortunately, you may either have to drop the bounding ACID transaction, or change the SPROC on the remote server so that it doesn't use SAVEPOINTs.
On a side note (Although I have seen that you have tagged it SQL SERVER 2000) but to make a point that SQL SERVER 2008 has remote proc trans Option for this.
In this case if the distributed table is not too large I would copy it to a temp table. If possible, include any filtering to get the number of rows to a minimum. Then you can proceed normally. Another option since the data changes rarely is copy the data to a permanant table and checking if anything has changed to prevent sending to much data over the network every time you run the transaction. You could only pull over the recent changes.
If you wish to handle transaction from UI level and you have Visual Studio 2008/.net fx 3.5 or + framework then you can wrap your logic with TransactionScope Class. If you dont have any frontends and you are working only on Sql Servers kindly ignore my answer...

How to Audit Database Activity without Performance and Scalability Issues?

I have a need to do auditing all database activity regardless of whether it came from application or someone issuing some sql via other means. So the auditing must be done at the database level. The database in question is Oracle. I looked at doing it via Triggers and also via something called Fine Grained Auditing that Oracle provides. In both cases, we turned on auditing on specific tables and specific columns. However, we found that Performance really sucks when we use either of these methods.
Since auditing is an absolute must due to regulations placed around data privacy, I am wondering what is best way to do this without significant performance degradations. If someone has Oracle specific experience with this, it will be helpful but if not just general practices around database activity auditing will be okay as well.
I'm not sure if it's a mature enough approach for a production
system, but I had quite a lot of success with monitoring database
traffic using a network traffic sniffer.
Send the raw data between the application and database off to another
machine and decode and analyse it there.
I used PostgreSQL, and decoding the traffic and turning it into
a stream of database operations that could be logged was relatively
straightforward. I imagine it'd work on any database where the packet
format is documented though.
The main point was that it put no extra load on the database itself.
Also, it was passive monitoring, it recorded all activity, but
couldn't block any operations, so might not be quite what you're looking for.
There is no need to "roll your own". Just turn on auditing:
Set the database parameter AUDIT_TRAIL = DB.
Start the instance.
Login with SQLPlus.
Enter the statement audit all;This turns on auditing for many critical DDL operations, but DML and some other DDL statements are still not audited.
To enable auditing on these other activities, try statements like these:audit alter table; -- DDL audit
audit select table, update table, insert table, delete table; -- DML audit
Note: All "as sysdba" activity is ALWAYS audited to the O/S. In Windows, this means the Windows event log. In UNIX, this is usually $ORACLE_HOME/rdbms/audit.
Check out the Oracle 10g R2 Audit Chapter of the Database SQL Reference.
The database audit trail can be viewed in the SYS.DBA_AUDIT_TRAIL view.
It should be pointed out that the internal Oracle auditing will be high-performance by definition. It is designed to be exactly that, and it is very hard to imagine anything else rivaling it for performance. Also, there is a high degree of "fine-grained" control of Oracle auditing. You can get it just as precise as you want it. Finally, the SYS.AUD$ table along with its indexes can be moved to a separate tablespace to prevent filling up the SYSTEM tablespace.
Kind regards,
Opus
If you want to record copies of changed records on a target system you can do this with Golden Gate Software and not incur much in the way of source side resource drain. Also you don't have to make any changes to the source database to implement this solution.
Golden Gate scrapes the redo logs for transactions referring to a list of tables you are interested in. These changes are written to a 'Trail File' and can be applied to a different schema on the same database, or shipped to a target system and applied there (ideal for reducing load on your source system).
Once you get the trail file to the target system there are some configuration tweaks you can set an option to perform auditing and if needed you can invoke 2 Golden Gate functions to get info about the transaction:
1) Set the INSERTALLRECORDS Replication parameter to insert a new record in the target table for every change operation made to the source table. Beware this can eat up a lot of space, but if you need comprehensive auditing this is probably expected.
2) If you don't already have a CHANGED_BY_USERID and CHANGED_DATE attached to your records, you can use the Golden Gate functions on the target side to get this info for the current transaction. Check out the following functions in the GG Reference Guide:
GGHEADER("USERID")
GGHEADER("TIMESTAMP")
So no its not free (requires Licensing through Oracle), and will require some effort to spin up, but probably a lot less effort/cost than implementing and maintaining a custom solution rolling your own, and you have the added benefit of shipping the data to a remote system so you can guarantee minimal impact on your source database.
if you are using oracle then there is feature called CDC(Capture data change) which is more performance efficient solution for audit kind of requirements.

Resources