This is a Change Data Capture scenario. But instead of enabling CDC on tables, I would like to read the database Transaction logs and filter it for certain tables.
For instance I want to know all updates, deletes, and all DDLs on certain table and then stream that log line into ElasticSearch whenever found.
What are some solutions out there that can let me monitor database logs live and stream to ElasticSearch?
Appreciate any feedback
Each RDBMs has it's own proprietary transaction log format sometimes version dependent. A few are documented, most are not. Some are intuitive, most are not.
There are companies selling CDC tools which know how to interpret those logs and they spend multiple man-years to properly, consistently, interpret the transaction log. Attunity was such company. Their "Replicate" product is now available through Qlik.
For SQLserver you can call FN_DBLOG (for the live log) and FN_DUMP_DBLOG (for the archived logs) to readily get to the RAW transaction row data. Next you'll need to figure out how to interpret that. Google (duck duck go) is your friend.
Start here perhaps: https://www.sqlserverlogexplorer.com/reading-sql-server-transaction-logs/
You only can do this on your own for a very limited set RDBMS's, and within there of a very limited set of Rows and Column Datatypes. Once you try to make a generic solution it turns into a career!
Good luck!
Hein.
Related
Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".
Anyone know how the underlying replication model in sql server works? Do they essentially depend on UTC datetime values to determine if something is new or do they keep a table of all the changes (like a table of tableID+rowid that have changed).
I am building my own "replication" system and was planning on using the dates to know what to replicate. Then I started wondering what would happen if the date got off in the computer for some reason. The obvious choice is to keep a log of the changes as you go and once you replicate those changes, you remove from the log of changes. But thats a lot of extra work, instead of just checking dates.
I figure if sql server replication works by just checking the dates, then that should be good enough for me.
Any wisdom here?
thanks
As a transaction occurs in SQL Server, it is written to the transaction log along with information pertinent to the transaction.
SQL Server replication uses this transaction log to determine which transactions have not yet been processed and to move them to the subscriber. There is a lot more going on under the hood to keep track of the intersection between transactions, publications, subscriptions, etc. but I will leave that to MSDN documentation about SQL Server replication http://msdn.microsoft.com/en-us/library/ms151198.aspx
Moving on to your point about building your own replication system:
Do not build your own replication system. There are too many complications involved that will cause you to spend many many days working. You will be much better off using the items that are shipped with SQL Server.
SQL Server replication methods are pretty impressive out of the box.
If you outline what causes you to think in terms of building your own replication system, we can help you figure out how to use existing items to provision what you need.
Also, read up as much as you can here to get an idea of what it can do for you http://msdn.microsoft.com/en-us/library/ms151198.aspx
SQL Server has a LogReader job that is aptly named. Replication reads the transaction log and applies appropriate transactions to the subscribing databases.
For one thing SQLServer (and it's not the only one) supports multiple replication algorithms.
You can find here details about the ones implemented in SQLServer 2008. Read first the X Replication Overview then follow the How X Replication works for more details.
How can I query the read/write ratio in Sql Server 2005? Are there any caveats I should be aware of?
Perhaps it can be found in a DMV query, a standard report, a custom report (i.e the Performance Dashboard), or examining a Sql Profiler trace. I'm not sure exactly.
Why do I care?
I'm taking time to improve the performance of my web app's data layer. It deals with millions of records and thousands of users.
One of the points I'm examining is database concurrency. Sql Server uses pessimistic concurrency by default--good for a write-heavy app. If my app is read-heavy, I might switch it to optimistic concurrency (isolation level: read committed snapshot) like Jeff Atwood did with StackOverflow.
All apps are heavy read only.
An UPDATE is a read for the WHERE clause followed by a write
An INSERT must check unique indexes and FKs, which are reads and why you index FK columns
At most you have 15% writes. I saw an article once discussing it, but can't find it again. More likely 1%.
I know that in our 6 million new rows per day DB, we still have a minimum of 95%+ reads (an estimate of course).
Why do you need to know?
Also: How to find out SQL Server table’s read/write statistics?
Edit, based on the question update...
I would leave DB concurrency until you need to change it. We've not change anything out of the box for our 6 million rows + heavy reads too
For tuning our web app, we designed it to reduce round trips (one call = one action, mutliple record sets per call etc)
Check out sys.dm_db_index_usage_stats:
seeks, scans, lookups are all reads
updates are writes
Keep in mind that the counters are reset with each server restart, you need to look at them only after a representative load was run.
There are also some performance counters that can help you:
Batch Requests/sec: number of Transact-SQL command batches received per second.
Write Transactions/sec: number of transactions that wrote to the database and committed
Transactions/sec: number of transactions started for the database
From these rates you can get a pretty good estimate of read:write ratio of your requests.
after your update
Turning on the version store is probably the best avenue for dealing with concurrency. Rather than using the snapshot isolation explicitly, I'd recommend turning on read committed snapshot:
alter database <dbname> set allow_snapshot_isolation on;
alter database <dbname> set read_committed_snapshot on;
this will make read committed reads (ie. the default ones) to use snapshot instead, so it literally doesn't require any change in the app and can be quickly tested.
You should also investigate if your reads don't get executed under serialization reads isolation level, which is what happens when a TransactionScope is used w/o explicitly specifying the isolation level.
One word of caution that the version store is not exactly free. See Row Versioning Resource Usage. And you should give a read to SQL Server 2005 Row Versioning-Based Transaction Isolation.
How about finding a ratio of num_of_writes & num_of_reads counters in sys.dm_io_virtual_file_stats?
I did it using SQL Server Profiler. I just opened it before running application and tested what kind of queries are executed while I'm doing something in application. But I think it's better just for making sure that queries work, don't know if it is convenient for measuring server workload like this. Profiler can also save traced which you can analyse later, so it might work.
Change Data Capture is a new feature in SQL Server 2008. From MSDN:
Change data capture provides
historical change information for a
user table by capturing both the fact
that DML changes were made and the
actual data that was changed. Changes
are captured by using an asynchronous
process that reads the transaction log
and has a low impact on the system
This is highly sweet - no more adding CreatedDate and LastModifiedBy columns manually.
Does Oracle have anything like this?
Sure. Oracle actually has a number of technologies for this sort of thing depending on the business requirements.
Oracle has had something called Workspace Manager for a long time (8i days) that allows you to version-enable a table and track changes over time. This can be a bit heavyweight, though, because it is based on views with instead-of triggers.
Starting in 11.1 (as an extra cost option to the enterprise edition), Oracle has a Total Recall that asynchronously mines the redo logs for data changes that get logged to a separate table which can then be queried using flashback query syntax on the main table. Total Recall is automatically going to partition and compress the historical data and automatically takes care of purging the data after a specified data retention period.
Oracle has a LogMiner technology that mines the redo logs and presents transactions to consumers. There are a number of technologies that are then built on top of LogMiner including Change Data Capture and Streams.
You can also use materialized views and materialized view logs if the goal is to replicate changes.
Oracle has Change Data Notification where you register a query with the system and the resources accessed in that query are tagged to be watched. Changes to those resources are queued by the system allowing you to run procs against the data.
This is managed using the DBMS_CHANGE_NOTIFICATION package.
Here's an infodoc about it:
http://www.oracle-base.com/articles/10g/dbms_change_notification_10gR2.php
If you are connecting to Oracle from a C# app, ODP.Net (Oracles .Net client library) can interact with Change Data Notification to alert your c# app when Oracle changes are made - pretty kewl. Goodbye to polling repeatedly for data changes if you ask me - just register the table, set up change data notifcation through ODP.Net and wala, c# methods get called only when necessary. woot!
"no more adding CreatedDate and LastModifiedBy columns manually" ... as long as you can afford to keep complete history of your database online in the redo logs and never want to move the data to a different database.
I would keep adding them and avoid relying on built-in database techniques like that. If you have a need to keep historical status of records then use an audit table or ship everything off to a data warehouse that handles slowly changing dimensions properly.
Having said that, I'll add that Oracle 10g+ can mine the log files simply by using flashback query syntax. Examples here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2112847
This technology is also used in Oracle's Datapump export utility to provide consistent data for multiple tables.
I believe Oracle has provided auditing features since 8i, however the tables used to capture the data are rather complex and there is a significant performance impact when this is turned on.
In Oracle 8i you could only enable this for an entire database and not a table at a time, however 9i introduced Fine Grained Auditing which provides far more flexibility. This has been expanded upon in 10/11g.
For more information see http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Also in 11g Oracle introduced the Audit Vault, which provides secure storage for audit information, even DBA's cannot change this data (according to Oracle's documentation, I haven't used this feature yet). More info can be found at http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Oracle has mechanism called Flashback Data Archive. From A Fresh Look at Auditing Row Changes:
Oracle Flashback Query retrieves data as it existed at some time in the past.
Flashback Data Archive provides the ability to track and store all transactional changes to a table over its lifetime. It is no longer necessary to build this intelligence into your application. A Flashback Data Archive is useful for compliance with record stage policies and audit reports.
CREATE TABLESPACE SPACE_FOR_ARCHIVE
datafile 'C:\ORACLE DB12\ARCH_SPACE.DBF'size 50G;
CREATE FLASHBACK ARCHIVE longterm
TABLESPACE space_for_archive
RETENTION 1 YEAR;
ALTER TABLE EMPLOYEES FLASHBACK ARCHIVE LONGTERM;
select EMPLOYEE_ID, FIRST_NAME, JOB_ID, VACATION_BALANCE,
VERSIONS_STARTTIME TS,
nvl(VERSIONS_OPERATION,'I') OP
from EMPLOYEES
versions between timestamp timestamp '2016-01-11 08:20:00' and systimestamp
where EMPLOYEE_ID = 100
order by EMPLOYEE_ID, ts;
I have a need to do auditing all database activity regardless of whether it came from application or someone issuing some sql via other means. So the auditing must be done at the database level. The database in question is Oracle. I looked at doing it via Triggers and also via something called Fine Grained Auditing that Oracle provides. In both cases, we turned on auditing on specific tables and specific columns. However, we found that Performance really sucks when we use either of these methods.
Since auditing is an absolute must due to regulations placed around data privacy, I am wondering what is best way to do this without significant performance degradations. If someone has Oracle specific experience with this, it will be helpful but if not just general practices around database activity auditing will be okay as well.
I'm not sure if it's a mature enough approach for a production
system, but I had quite a lot of success with monitoring database
traffic using a network traffic sniffer.
Send the raw data between the application and database off to another
machine and decode and analyse it there.
I used PostgreSQL, and decoding the traffic and turning it into
a stream of database operations that could be logged was relatively
straightforward. I imagine it'd work on any database where the packet
format is documented though.
The main point was that it put no extra load on the database itself.
Also, it was passive monitoring, it recorded all activity, but
couldn't block any operations, so might not be quite what you're looking for.
There is no need to "roll your own". Just turn on auditing:
Set the database parameter AUDIT_TRAIL = DB.
Start the instance.
Login with SQLPlus.
Enter the statement audit all;This turns on auditing for many critical DDL operations, but DML and some other DDL statements are still not audited.
To enable auditing on these other activities, try statements like these:audit alter table; -- DDL audit
audit select table, update table, insert table, delete table; -- DML audit
Note: All "as sysdba" activity is ALWAYS audited to the O/S. In Windows, this means the Windows event log. In UNIX, this is usually $ORACLE_HOME/rdbms/audit.
Check out the Oracle 10g R2 Audit Chapter of the Database SQL Reference.
The database audit trail can be viewed in the SYS.DBA_AUDIT_TRAIL view.
It should be pointed out that the internal Oracle auditing will be high-performance by definition. It is designed to be exactly that, and it is very hard to imagine anything else rivaling it for performance. Also, there is a high degree of "fine-grained" control of Oracle auditing. You can get it just as precise as you want it. Finally, the SYS.AUD$ table along with its indexes can be moved to a separate tablespace to prevent filling up the SYSTEM tablespace.
Kind regards,
Opus
If you want to record copies of changed records on a target system you can do this with Golden Gate Software and not incur much in the way of source side resource drain. Also you don't have to make any changes to the source database to implement this solution.
Golden Gate scrapes the redo logs for transactions referring to a list of tables you are interested in. These changes are written to a 'Trail File' and can be applied to a different schema on the same database, or shipped to a target system and applied there (ideal for reducing load on your source system).
Once you get the trail file to the target system there are some configuration tweaks you can set an option to perform auditing and if needed you can invoke 2 Golden Gate functions to get info about the transaction:
1) Set the INSERTALLRECORDS Replication parameter to insert a new record in the target table for every change operation made to the source table. Beware this can eat up a lot of space, but if you need comprehensive auditing this is probably expected.
2) If you don't already have a CHANGED_BY_USERID and CHANGED_DATE attached to your records, you can use the Golden Gate functions on the target side to get this info for the current transaction. Check out the following functions in the GG Reference Guide:
GGHEADER("USERID")
GGHEADER("TIMESTAMP")
So no its not free (requires Licensing through Oracle), and will require some effort to spin up, but probably a lot less effort/cost than implementing and maintaining a custom solution rolling your own, and you have the added benefit of shipping the data to a remote system so you can guarantee minimal impact on your source database.
if you are using oracle then there is feature called CDC(Capture data change) which is more performance efficient solution for audit kind of requirements.