How to Audit Database Activity without Performance and Scalability Issues? - database

I have a need to do auditing all database activity regardless of whether it came from application or someone issuing some sql via other means. So the auditing must be done at the database level. The database in question is Oracle. I looked at doing it via Triggers and also via something called Fine Grained Auditing that Oracle provides. In both cases, we turned on auditing on specific tables and specific columns. However, we found that Performance really sucks when we use either of these methods.
Since auditing is an absolute must due to regulations placed around data privacy, I am wondering what is best way to do this without significant performance degradations. If someone has Oracle specific experience with this, it will be helpful but if not just general practices around database activity auditing will be okay as well.

I'm not sure if it's a mature enough approach for a production
system, but I had quite a lot of success with monitoring database
traffic using a network traffic sniffer.
Send the raw data between the application and database off to another
machine and decode and analyse it there.
I used PostgreSQL, and decoding the traffic and turning it into
a stream of database operations that could be logged was relatively
straightforward. I imagine it'd work on any database where the packet
format is documented though.
The main point was that it put no extra load on the database itself.
Also, it was passive monitoring, it recorded all activity, but
couldn't block any operations, so might not be quite what you're looking for.

There is no need to "roll your own". Just turn on auditing:
Set the database parameter AUDIT_TRAIL = DB.
Start the instance.
Login with SQLPlus.
Enter the statement audit all;This turns on auditing for many critical DDL operations, but DML and some other DDL statements are still not audited.
To enable auditing on these other activities, try statements like these:audit alter table; -- DDL audit
audit select table, update table, insert table, delete table; -- DML audit
Note: All "as sysdba" activity is ALWAYS audited to the O/S. In Windows, this means the Windows event log. In UNIX, this is usually $ORACLE_HOME/rdbms/audit.
Check out the Oracle 10g R2 Audit Chapter of the Database SQL Reference.
The database audit trail can be viewed in the SYS.DBA_AUDIT_TRAIL view.
It should be pointed out that the internal Oracle auditing will be high-performance by definition. It is designed to be exactly that, and it is very hard to imagine anything else rivaling it for performance. Also, there is a high degree of "fine-grained" control of Oracle auditing. You can get it just as precise as you want it. Finally, the SYS.AUD$ table along with its indexes can be moved to a separate tablespace to prevent filling up the SYSTEM tablespace.
Kind regards,
Opus

If you want to record copies of changed records on a target system you can do this with Golden Gate Software and not incur much in the way of source side resource drain. Also you don't have to make any changes to the source database to implement this solution.
Golden Gate scrapes the redo logs for transactions referring to a list of tables you are interested in. These changes are written to a 'Trail File' and can be applied to a different schema on the same database, or shipped to a target system and applied there (ideal for reducing load on your source system).
Once you get the trail file to the target system there are some configuration tweaks you can set an option to perform auditing and if needed you can invoke 2 Golden Gate functions to get info about the transaction:
1) Set the INSERTALLRECORDS Replication parameter to insert a new record in the target table for every change operation made to the source table. Beware this can eat up a lot of space, but if you need comprehensive auditing this is probably expected.
2) If you don't already have a CHANGED_BY_USERID and CHANGED_DATE attached to your records, you can use the Golden Gate functions on the target side to get this info for the current transaction. Check out the following functions in the GG Reference Guide:
GGHEADER("USERID")
GGHEADER("TIMESTAMP")
So no its not free (requires Licensing through Oracle), and will require some effort to spin up, but probably a lot less effort/cost than implementing and maintaining a custom solution rolling your own, and you have the added benefit of shipping the data to a remote system so you can guarantee minimal impact on your source database.

if you are using oracle then there is feature called CDC(Capture data change) which is more performance efficient solution for audit kind of requirements.

Related

SQL Server replication/redirection for heavy jobs

Need some sanity check.
Imagine having 1 SQL Server instance, a beefy system (i.e 48GB of RAM and tons of storage). Obviously there comes a point where it gets hammered in a situation where there are lots of jobs running.
These jobs/DB are part of an external piece of software and cannot be controlled or modified by us directly.
Now, when these jobs run, besides the queries probably being inefficient, do bring the DB down - they become very slow so any "regular" users are having slow responses.
The immediate thing I can think of is replication of some kind where maybe, the "secondary" DB would be the one where these jobs point to and do their hammering, still leaving the primary available and active but would receive any updates from secondary for data consistency/integrity.
Would this be the right thing to do? Ultimately I want the load to be elsewhere but have the primary be aware of updates and update itself without bringing it down or being very slow.
What is this called in MS SQL Server? Does such a thing exist? The jobs will be doing a read-write FYI.
There are numerous approaches to this, all of which are native to SQL Server, but I think you should look into Transactional Replication:
https://learn.microsoft.com/en-us/sql/relational-databases/replication/transactional/transactional-replication?view=sql-server-ver16
It effectively creates a read-only replica based on log shipping that, for reporting purposes, is practically real time.
From the documentation:
"By default, Subscribers to transactional publications should be treated as read-only, because changes are not propagated back to the Publisher. However, transactional replication does offer options that allow updates at the Subscriber."
Your scenario likely has nuances I don't know about, but you can use various flavors of SQL Replication, custom triggers, linked servers, 3-part queries, etc. to fill in the holes.

Detect Table Changes In A Database Without Modifications

I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.

Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008

Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".

Database Design Theory For Multiple Application Instances

I'm working on a SaaS project that will have each customer having an instance of the application (customer1.application.com, customer2.application.com, etc.) and ideally each customer would have their "own" space in the DB. The current plan is to create a DB for each customer and deploy an instance of the application into the web farm. The idea is that each customer could opt out of an upgrade to maintain the status quo (something one of our investors REALLY wanted, largely in part because he hates how Facebook keeps changing how it works.)
Last night I attempted to roll out, to my two test accounts, an update that altered the database. While the ensuing errors that were caused were my fault (forgetting a small but apparently very important change in the DDL) I'm starting to worry about my overall theory of operation because missing one ALTER COLUMN statement and a whole upgrade cycle could be blown to hell. So after that long build up here's my questions:
1) Is there a way to do a diff between two databases (the "test" production database and a actual production database) that will accurately record each change being made?
2) Is there another database (and/or application) design model I should be considering? I know if I take away supporting multiple versions of the application that I actually remove a lot of the long term support headaches.
Food for thought:
Code upgrades happen more frequently than DB Schema upgrades. Make sure you have a really good SCM in place to handle the code upgrades. We use git with great success.
Code is easy to manage, databases are not (in comparison). The reason is that they are mutable, and change each moment. Plus, they are really hard to roll back (possible, but time consuming, with downtime). So we must arrive at a simple way to track schema updates (along with associated data changes), and be able to apply them in the future to other similar databases.
Each database schema version should be given a unique, sequential integer version number. Start with 100 per say.
Each time you have to upgrade it, write an sql scripts like
100-101.sql
101-102.sql
102-103.sql
It is the job of each script to perform the upgrade for that specific version. It can be as simple as adding a table, or as complicated as re-arranging foreign keys. But in any event, they will be reliable in what they are designed to do.
You can apply any given script many times during testing (on fresh data) to ensure it will work as expected.
So when you find yourself needing to upgrade a client from version 130 to 180, you can safely apply the sql scripts (IN ORDER), and you will arrive at the correct destination.
You should never be changing DBs by hand. Do it by a script that does all DDL changes, etc...
Ideally, there should be a generic DB release script that uses DDL version as configuration/input.
(and DDL changes should be tagged with a specific tag in a versioning system)
You can go Microsoft route re: supporting multiple versions as a headache - simply designate all versions prior to X (say 2 versions back) as un-supported. That way, you can support last 2-3 versions but don't waste resources on anything more, while allowing per-client flexibility to a large extent.
You should carefully weigh pros/cons of having versioned app/DB system like you propose.
List the the pros (such as placating an investor, positive experience for a client when version changes unexpectedly that you mentioned - translated into marginal probability to retain/add new clients who require such feature, plus an easy way to do BETA/UAT testing, plus a failrly wasy way to roll back the schema changes gone awry by loading client's data into DB schema from prior version).
List the cons (cost of DB space, cost of your time to implement, cost of support)
Compare the two and decide which is better for your business.
Redgate's SQL Compare does a good job of comparing and diffing two SQL Server databases (warning: commercial third-party product). Also, I think there's free stuff out there that does much the same thing.
If you want to be able to leave some customers behind on older versions of your product, it might make more sense to maintain a one-database-per-customer model, with the scripts for building each version of the databases under source control. This keeps your customers isolated from each other, and even allows you to switch database vendors (e.g. from SQL Server to Oracle) or versions (i.e. from SQL Server 2000 to Sql Server 2005) on some customers while keeping other customers on the older versions.
Manual run scripts will not work. Nor diff tools, for the matter. Diff works for 2,4 maybe 10 databases. But does not scale, because what you need is reliability in presence of failures (offline databases, server restarting all that).
You deploy by scheduling upgrade scripts. For instance, see how MySpace does this for over 1000 databases: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. the key take is that they use a guaranteed, reliable, delivery mechanism (SSB) to deploy schema maintenance scripts. You need an asynchronous, reliable, mechanism to run scripts because destination databases may be offline, running scheduled maintenance, unreacahbe etc, and a reliable delivery mechanism like Service Broker can handle all the retries and related issues (handling duplicates, acknowledgments etc). You can also look at Asynchronous procedure execution for an example of how to handle script execution via SSB.
As for the scripts themselves, I recommend you start looking at your database schema and configuration data as a versioned resource. I have addresses this problem already several times, eg. see Do you put your database static data into source-control ? How?
Update
I guess I own some explanation why I consider diffing a wrong approach. Just to makes things clear, I'm talking about deployments of hundreds of servers and thousands of databases. The original post compares itself to facebook and I whish them to have the success to reach that size, but also the questions asks about design principles, so I say that discussing about cloud level scale is appropiate.
I see two problems with diff tools:
Availability. All diff tools work by connecting to both the 'master' and the 'copy', so they can do their job only when both are online. This creates a hot spot, a single point of failure, the 'master' copy, whose availability becomes critical for deploying upgrades. High availability always comes at a cost. It also leaves the problem of 'copy' availability as a minor implementation details, the upgrade scheme must handle retries and time outs and disconnects from the client on its own (not a trivial problem by any means).
Atomicity. The diff tools expect a stable schema of the 'master'. This in effect places a freeze on 'master' while an upgrade is taking place. While this can be controlled on a small scale, on large scales it becomes a problem as upgrading the master itself to v. N+1 becomes a race against all the thousands of databases, when some may be still upgrading from v. N-1.
Script based solutions that ship the upgrade script to the 'copy' solve both of these problems. Also diff tools like the VSDB .dbschema based vsdbcmd.exe are better than a 'live' diff tool since the 'master' dbschema file can be delivered to the 'copy' machine and turn the whole upgrade process into a local operation.
Overall I also belive that script based upgrade, using metadata versioning, is supperior to diff based upgrade, because of reasons of testing and source control I had already talked about in the link to Q1525591.
if I take away supporting multiple
versions of the application that I
actually remove a lot of the long term
support headaches
Any change, however small, has a chance of breaking something that is important for someone.
So if you have multiple customers, rolling out a fix for customer 1 will upset customer 2. It doesn't even have to be a bugged release; it might just be a change in behaviour they disagree with. For most customers, not controlling the release schedule is simply unacceptable.
So I'd advise you to keep a different codebase for every customer. Roll out fixes only after agreement with a customer.
There is a number of customers where this approach breaks down (think Yahoo mail), but reading your question I think you're safely below that number. And for a compare tool, I can't help but agree with the posts suggesting Redgate's SQL Compare.

Does Oracle have something like Change Data Capture in SQL Server 2008?

Change Data Capture is a new feature in SQL Server 2008. From MSDN:
Change data capture provides
historical change information for a
user table by capturing both the fact
that DML changes were made and the
actual data that was changed. Changes
are captured by using an asynchronous
process that reads the transaction log
and has a low impact on the system
This is highly sweet - no more adding CreatedDate and LastModifiedBy columns manually.
Does Oracle have anything like this?
Sure. Oracle actually has a number of technologies for this sort of thing depending on the business requirements.
Oracle has had something called Workspace Manager for a long time (8i days) that allows you to version-enable a table and track changes over time. This can be a bit heavyweight, though, because it is based on views with instead-of triggers.
Starting in 11.1 (as an extra cost option to the enterprise edition), Oracle has a Total Recall that asynchronously mines the redo logs for data changes that get logged to a separate table which can then be queried using flashback query syntax on the main table. Total Recall is automatically going to partition and compress the historical data and automatically takes care of purging the data after a specified data retention period.
Oracle has a LogMiner technology that mines the redo logs and presents transactions to consumers. There are a number of technologies that are then built on top of LogMiner including Change Data Capture and Streams.
You can also use materialized views and materialized view logs if the goal is to replicate changes.
Oracle has Change Data Notification where you register a query with the system and the resources accessed in that query are tagged to be watched. Changes to those resources are queued by the system allowing you to run procs against the data.
This is managed using the DBMS_CHANGE_NOTIFICATION package.
Here's an infodoc about it:
http://www.oracle-base.com/articles/10g/dbms_change_notification_10gR2.php
If you are connecting to Oracle from a C# app, ODP.Net (Oracles .Net client library) can interact with Change Data Notification to alert your c# app when Oracle changes are made - pretty kewl. Goodbye to polling repeatedly for data changes if you ask me - just register the table, set up change data notifcation through ODP.Net and wala, c# methods get called only when necessary. woot!
"no more adding CreatedDate and LastModifiedBy columns manually" ... as long as you can afford to keep complete history of your database online in the redo logs and never want to move the data to a different database.
I would keep adding them and avoid relying on built-in database techniques like that. If you have a need to keep historical status of records then use an audit table or ship everything off to a data warehouse that handles slowly changing dimensions properly.
Having said that, I'll add that Oracle 10g+ can mine the log files simply by using flashback query syntax. Examples here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2112847
This technology is also used in Oracle's Datapump export utility to provide consistent data for multiple tables.
I believe Oracle has provided auditing features since 8i, however the tables used to capture the data are rather complex and there is a significant performance impact when this is turned on.
In Oracle 8i you could only enable this for an entire database and not a table at a time, however 9i introduced Fine Grained Auditing which provides far more flexibility. This has been expanded upon in 10/11g.
For more information see http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Also in 11g Oracle introduced the Audit Vault, which provides secure storage for audit information, even DBA's cannot change this data (according to Oracle's documentation, I haven't used this feature yet). More info can be found at http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Oracle has mechanism called Flashback Data Archive. From A Fresh Look at Auditing Row Changes:
Oracle Flashback Query retrieves data as it existed at some time in the past.
Flashback Data Archive provides the ability to track and store all transactional changes to a table over its lifetime. It is no longer necessary to build this intelligence into your application. A Flashback Data Archive is useful for compliance with record stage policies and audit reports.
CREATE TABLESPACE SPACE_FOR_ARCHIVE
datafile 'C:\ORACLE DB12\ARCH_SPACE.DBF'size 50G;
CREATE FLASHBACK ARCHIVE longterm
TABLESPACE space_for_archive
RETENTION 1 YEAR;
ALTER TABLE EMPLOYEES FLASHBACK ARCHIVE LONGTERM;
select EMPLOYEE_ID, FIRST_NAME, JOB_ID, VACATION_BALANCE,
VERSIONS_STARTTIME TS,
nvl(VERSIONS_OPERATION,'I') OP
from EMPLOYEES
versions between timestamp timestamp '2016-01-11 08:20:00' and systimestamp
where EMPLOYEE_ID = 100
order by EMPLOYEE_ID, ts;

Resources