Is there meta data I can read from SQL Server to know the last changed row/table?

Is there meta data I can read from SQL Server to know the last changed row/table? - sql-server

We have a database with hundreds of tables.
Is there some kind of meta data source in SQL Server that I can programatically query to get the name of the last changed table and row?
Or do we need to implement this ourselves with fields in each table called LastChangedDateTime, etc.?

In terms of finding out when a table last had a modification, there is a sneaky way that can work to access this information, but it will not tell you which row was altered, just when.
SQL Server maintains index usage statistics, and records the last seek / scan / lookup and update on an index. It also splits this by user / system.
Filtering that to just the user tables, any insert / update / deletion will cause an update to occur on the index, and the DMV will update with this new information.
select o.name,
max(u.last_user_seek) as LastSeek,
max(u.last_user_scan) as LastScan,
max(u.last_user_lookup) as LastLookup,
max(u.last_user_update) as LastUpdate
from sys.dm_db_index_usage_stats u
inner join sys.objects o on o.object_id = u.object_id
where o.type = 'U' and o.type_desc = 'USER_TABLE'
group by o.name
It is not ideal however, a heap has no index for a start - and I have never considered using it for production code as a tracking mechanism, only as a forensic tool to check obvious alterations.
If you want proper row level alteration tracking you will either have to build that in, or look at the SQL 2008 specific Change Data Capture feature.

The [sys].[tables] view will tell you when the table was created and last modified (in terms of schema, not insert, updates or deletes). To my knowledge there is no built-in information about last modified for each record in the database (it would take up a lot of space anyway, so it's probably nice not to have it). So you should add a last modified field yourself, and maybe have it updated automatically by a trigger.

Depending on the recovery model you might be able to get this from the transaction log using fn_dblog http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-6-Using-fn_dblog-to-tell-if-a-transaction-is-contained-in-a-backup.aspx

Related

How to systematically manage a Big List of Queries and Tables in SQL Server?

Suppose someone has to work on a lot of different SQL Server Databases which have got a lot of Tables and Queries / Views inside them.
After a period of time, it becomes very difficult to remember exactly what kind of columns are present within a given Table and View.
Please suggest some method by which one can keep a systematic list of all the Tables and Views that are present within a SQL Server Database, along with the columns that are present within them.
Are there any Add-on products or services etc. available that helps in making this type of work systematic?
Currently I add comments to each queries inside SQL Server to remind me of what this query is doing, but this method is not great. I am looking for some better and more efficient methods.
Please share any ideas that you might have in this direction.
Thanks a lot

You may find the following useful for each database.
select s.name, s.type, c.name , s.refdate
from syscolumns c
inner join sysobjects s on s.id = c.id
where s.xtype in('U','V')
order by s.refdate --use refdate for manual quick looks
-- use s.name for file output and long term analysis
I output this to text files with the exact same format and check them into source control for each database. I even make comments about fields as things change. This is not part of the formal process, it is just sanity big picture version tracking independent of the formal deployments.

SQL Server : figure out the tables behind a report, daily email? (I have the to db)?

I have access to the database in SQL Server Management Studio, I can see all the tables.
We have a daily report sent by email - however we want to know what the SQL query behind the report is, as we cannot get hold of the developers.
Hence I found out the foreign keys and which primary keys they are related to.. but half way through I've come across columns and there doesn't seem to be a key associated with them.
I do not have the time to go through 150+ tables.
How can I find out which table the value has come from without a key?, should there always be a key? Can I search through the entire database, all of the tables for a value in that column so I may find the offending tables - wherever they are?
Help - it's like reverse engineering and taking too long... please

On a Microsoft SQL Server you can use SQL Server Profiler to log all DB queries. If you know the time of the day the report is populated, run the trace at that time, and you'll be able to see the exact SQL statements used for it.
See https://youtu.be/IaxG6jbNuj8

If the report is generated from a stored procedure, then finding the stored procedure would give you all of that info.
This might help you find the stored procedure:
select *
from sysobjects so
inner join syscomments sc on so.id = sc.id
where sc.text like '%columnname%'
and xtype = 'P'
Just put in some search strings (maybe the column outputs) between the % signs.

How Do I find what is populating a table?

I constantly run into this problem. I am working in a data warehouse and I cannot find out what is populating a table. Typically the table is being populated on a daily basis from either other table in the warehouse or from an Oracle database. I have tried the below query and can confirm the updates, but i cannot see what is doing it. I searched to the known SSIS package and stored procedure with similar names and SQL jobs but I can find nothing.
select object_name(object_id) as DatabaseName, last_user_update, *
from sys.dm_db_index_usage_stats
where database_id = DB_ID('Warehouse')
and object_id=object_id('PAYMENTS_DAILY')
I only have the most basic SQL Server tools available so no fancy search tools :(

There is no way to tell, after data has been inserted into a data, where the data came from without having some sort of logging.
SSIS has logging, you can use triggers on the tables, change data capture, audit columns, etc. are the many ways to do this.

Frequently, if you know when the row was added, that can help you figure out what process is adding it. Add a new "InsertedDatetime" column to your warehouse table and give it a default value of getdate(). If you know that the rows always come in at 11:15 AM, you can use that to narrow your search.
That will probably be enough information, but if that doesn't help you track down the process, then you can add additional columns that contain everything from a source IP address to a calling object name.
As a last resort, you could rename your table and create a view named the same and then use an Instead Of Insert trigger on it that just holds open the connection so you can examine the currently executing processes to figure out where it's coming from.
I bet you can figure it out from the time alone though.

SSIS - How to improve a work flow

Previously I have asked for a possible solution for a situation that I had to face in order to implement a sql query (which is implementing originally in access). I have reach a solution (after asking a lot) but I would like to know if anyone has another way to execute this query.
I have got two different tables, one in sql and another in oracle (S and O)
O(A, B, C) => PK=(A,B) and S(D,E,F) => PK = (D,E)
The query looks like this
SELECT A,B,C,E,F
FROM S INNER JOIN O ON
S.D = O.A (Only one attribute of the PK in O)
S has over 10.000 registers and O more than 700 millions. Given this, is not logic to implement a merge join, or a look up because I will have only the first match between D and A.
So I thought that it will be better to assemble the query in the Oracle side. To do this I have implemented an scheme like this.
With the sql I have executed this query:
with tmp(A) as ( select distinct D as A from S
)
select cast( select concat(' or A = ', A)
from tmp
for xml path('')) as nvarchar(max)) as ID
I am getting a string with the values that I gonna search on oracle.
Finally in the data flow, I am creating an expression like this:
select A, B, C
from O
where A= '' + #ID
I downnload this values to sql server and then I am able to manipulate them as I wish.
The use of the foreach loop was necessary because I am storing the string of sql inside an object variable. I found that SSIS has some troubles with the nvarchar(max) variables.
Some considerations:
1) The Oracle database is administered for another area of the company and they only gives reading permissions over the tables.
2) The DBA of the sql server does not allow to download the O table on a staging area. Not possibilities of negotiations with him, besides, this tabla is updated every day with more registers. He only manages this server and does not have any authority over Oracle.
3) The solution that was given for some members of my team was to create a query in oracle between different tables that can give me the attributes of O that I need, as a result I could get more than 3 millions of register and not all of the attributes A are presented in S. Even more, some the values of D has been manipulated, so possibly they are not going to be present in O.
With this implementation I am getting more than 150.000 registers from Oracle. But I would like to know if another solution can be implemented or if there are other components that I can use to reach the same results. Believe me when I say that I have read, asked and searched a lot before to implement this flow.

EDITED:
Option 1 (You say that you cannot use this solution – but it would be the first one – the best)
Use a DBLink to let Oracle access S table (you must use Oracle Database Gateway). Create a view in Oracle joining O and S. And finally use linked server to let SQL Server access the Oracle Joining view and get the results.
The process is as follow:
You must convince your Oracle DBA to configure the Oracle Database Gateway for SQL Server (see
http://docs.oracle.com/cd/B28359_01/gateways.111/b31043/conf_sql.htm#CIHGADGB)
. When it is properly configured then you can create a DBLink from SQL Server to Oracle. With the DBLink Oracle will have have a direct
access to S table.
Now create a view V just joining O and S table.
As you want the result back in your SQL Server and you cannot use
SSIS then you can proceed as described in:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/111df59c-b309-4d59-b56c-9cd5574ee181/how-to-access-oracle-table-from-sql-server-?forum=transactsql
Option 2 (You say that you cannot use this solution – but it would be the second one)
As your Oracle admins seem to be monsters that will kill you if they get their paws on you. Then you can try (if they let you create a table in oracle):
Create a linked server in SQL Server (to access Oracle from SQL
Server). As I mentioned in the "normal case".
And Create a (temporary) table in Oracle schema with only 1 column (it will store D values from SQL Server)
Everytime you need to evaluate your query execute in SQL Server:
INSERT INTO ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE
SELECT DISTINCT D FROM S;
SELECT * FROM OPENQUERY('SELECT * FROM ORACLE_OWNER.O WHERE A IN (SELECT D FROM ORACLE_OWNER.TEMP_TABLE)');
And finally don't forget to delete the Oracle's temp table:
DELETE * FROM ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE;
Option 3 (If you have an Oracle license and one available host)
You can install your own Oracle server in your host and use Option 2.
Option 4
If your solution is really the only way out, then let's try to improve it a little bit.
As you know, your solution works but it is a little bit aggressive (you are transforming a relational algebra semijoin operator into a relational algebra selection operator with a monster condition). You say that the Oracle table is updated everyday with more register, but if the update rate of your tables are lower than your query rate then you can create a result cache that you can use while the tables S or O are not changed.
Proceed as follows:
Create a table in your SQL Server to store the Oracle result of your monster query. And before build and launch your query execute this:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
This returns the most recent time when your table S was update. Store this value in a table (create a new table or store this value in a typical parameter table).
Create your monster query. But before launch it, send this query to Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
It returns the last SCN (System Change Number) that cause a change in the table. Store this value in a table (create a new table or store this value in a typical parameter table).
Launch the big query and store its result into the cache table.
Finally, when you need to repeat the big query, first execute in your SQL Server:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
And execute in Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
If one or both values have changed with respect the one you have stored in your parameter table, then you must store them in the parameters table (updating the old values) and launch again the big query. But if none of the values have changed, then your cache is up to date, and you can use it.
Note that
SCN is not absolutely precise, but it is a good approximation (see: http://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns007.htm)
The greater is your query rate with respect to your update rate, the better is this solution.
If you can tolerate working with old values, then you can improve the cache with expiration time.

Audit each inserted row in a Trigger

I am trying to do an audit history by adding triggers to my tables and inserting rows intto my Audit table. I have a stored procedure that makes doing the inserts a bit easier because it saves code; I don't have to write out the entire insert statement, but I instead execute the stored procedure with a few parameters of the columns I want to insert.
I am not sure how to execute a stored procedure for each of the rows in the "inserted" table. I think maybe I need to use a cursor, but I'm not sure. I've never used a cursor before.
Since this is an audit, I am going to need to compare the value for each column old to new to see if it changed. If it did change I will execute the stored procedure that adds a row to my Audit table.
Any thoughts?

I would trade space for time and not do the comparison. Simply push the new values to the audit table on insert/update. Disk is cheap.
Also, I'm not sure what the stored procedure buys you. Can't you do something simple in the trigger like:
insert into dbo.mytable_audit
(select *, getdate(), getdate(), 'create' from inserted)
Where the trigger runs on insert and you are adding created time, last updated time, and modification type fields. For an update, it's a little tricker since you'll need to supply named parameters as the created time shouldn't be updated
insert into dbo.mytable_audit (col1, col2, ...., last_updated, modification)
(select *, getdate(), 'update' from inserted)
Also, are you planning to audit only successes or failures as well? If you want to audit failures, you'll need something other than triggers I think since the trigger won't run if the transaction is rolled back -- and you won't have the status of the transaction if the trigger runs first.
I've actually moved my auditing to my data access layer and do it in code now. It makes it easier to both success and failure auditing and (using reflection) is pretty easy to copy the fields to the audit object. The other thing that it allows me to do is give the user context since I don't give the actual user permissions to the database and run all queries using a service account.

If your database needs to scale past a few users this will become very expensive. I would recommend looking into 3rd party database auditing tools.

There is already a built in function UPDATE() which tells you if a column has changed (but it is over the entire set of inserted rows).
You can look at some of the techniques in Paul Nielsen's AutoAudit triggers which are code generated.
What it does is check both:
IF UPDATE(<column_name>)
INSERT Audit (...)
SELECT ...
FROM Inserted
JOIN Deleted
ON Inserted.KeyField = Deleted.KeyField -- (AutoAudit does not support multi-column primary keys, but the technique can be done manually)
AND NOT (Inserted.<column_name> = Deleted.<column_name> OR COALESCE(Inserted.<column_name>, Deleted.<column_name>) IS NULL)
But it audits each column change as a separate row. I use it for auditing changes to configuration tables. I am not currently using it for auditing heavy change tables. (But in most transactional systems I've designed, rows on heavy activity tables are typically immutable, you don't have a lot of UPDATEs, just a lot of INSERTs - so you wouldn't even need this kind of auditing). For instance, orders or ledger entries are never changed, and shopping carts are disposable - neither would have this kind of auditing. On low volume change tables, like customer, you can use this kind of auditing.

Jeff,
I agree with Zodeus..a good option is to use a 3rd tool.
I have used auditdatabase (FREE)web tool that generates audit triggers (you do not need to write a single line of TSQL code)
Another good tools is Apex SQL Audit but..it's not free.
I hope this helps you,
F. O'Neill