How Do I find what is populating a table? - sql-server

I constantly run into this problem. I am working in a data warehouse and I cannot find out what is populating a table. Typically the table is being populated on a daily basis from either other table in the warehouse or from an Oracle database. I have tried the below query and can confirm the updates, but i cannot see what is doing it. I searched to the known SSIS package and stored procedure with similar names and SQL jobs but I can find nothing.
select object_name(object_id) as DatabaseName, last_user_update, *
from sys.dm_db_index_usage_stats
where database_id = DB_ID('Warehouse')
and object_id=object_id('PAYMENTS_DAILY')
I only have the most basic SQL Server tools available so no fancy search tools :(

There is no way to tell, after data has been inserted into a data, where the data came from without having some sort of logging.
SSIS has logging, you can use triggers on the tables, change data capture, audit columns, etc. are the many ways to do this.

Frequently, if you know when the row was added, that can help you figure out what process is adding it. Add a new "InsertedDatetime" column to your warehouse table and give it a default value of getdate(). If you know that the rows always come in at 11:15 AM, you can use that to narrow your search.
That will probably be enough information, but if that doesn't help you track down the process, then you can add additional columns that contain everything from a source IP address to a calling object name.
As a last resort, you could rename your table and create a view named the same and then use an Instead Of Insert trigger on it that just holds open the connection so you can examine the currently executing processes to figure out where it's coming from.
I bet you can figure it out from the time alone though.

Related

In this situation, best practice to create an ORACLE VIEW or TABLE?

In order to be able to report out oracle logons I have a query to find logons to the system. I want to be able to output the query results to a table or view in order to then report on this table/view. The underlying tables that the query is based on do not keep historical data hence my need for a new table/view.
Query will run 3 times a day to gather logon information at those times.
Is it best to create a new table and append the daily information updates or would a view be better practice? I'm unsure on the updating of a view as the underlying tables needs to be present, if that's correct.
Thanks
A view won't help at all. It is just a stored query and doesn't contain any data; it just reflects what you have in underlying table(s).
Therefore, you'll need a "history" table which will permanently hold data you're interested in.

Data warehouse: Figuring out what rows changed of a sql server table to facilitate data warehouse

BUSINESS SCENARIO, SEEKING A WAY TO PROGRAM THIS:
Every night, I have to update table ABC in the data warehouse database from the production database. The table is millions of rows, so I want to do this efficiently.
The table doesn't have any sort of timestamp marker (LastUpdated Date\Time).
The database was created by our vendor whose software we run, and they are giving us visibility into our data. We may not have much leverage in terms of asking for new columns to house information such as LastUpdate DateTime stamp.
Is there a way, absent such information, to be able to identify those rows that have changed or added.
For example, is there such a thing as query-able physical row number associated with the table record, that might help us work towards a solution? If that could be queried, and perhaps go sequentially, then maybe there is a way to get the inserted rows.
Updated rows, I am not so sure.
Just entertaining ideas at this point in time to see if there is an efficient solution for this scenario.
Ideally, the solution will be geared towards a stored procedure we can have run every night be a job.
Thank you.
I saw this comment but I am not so sure that the solution is efficient:
Find changed rows (composite key with nulls)
Please check the MERGE operator,You can create a SQL Server Job which can execute the MERGE Script to check and update the changes if any.

Update table as same table in another database changes

I have two databases in one instance of SQL server and they have the same structure.
Now I want to write some triggers for some of the tables in databases to get synced with each other whenever they got inserted, updated or deleted records.
something like below will be going to be one of the triggers :
CREATE TRIGGER AdminMessage_Insert
ON AdminMessage
AFTER INSERT
AS
INSERT INTO SecondDb.dbo.AdminMessage
( ID ,
DeptKey ,
AdminKey ,
ReceiverKey ,
MessageText ,
IsActive
)
SELECT i.ID, i.DeptKey, i.AdminKey, i.ReceiverKey, i.MessageText, i.IsActive
FROM INSERTED i
so my problem is that there are many tables and writing about three triggers for each of them doesn't seem to be the best solution.
can you give me a better and smaller approach?
UPDATE
I found some ways like CDC, Change Tracking, SQL Audit And of course Replication (snap replication) and read about them.
as I understand the best solution for me is using 'CDC' Or 'Audit'.
in both of them, I must work with each table one by one that takes a long time from me.
can I have all table changes with less work and with one SQL instance? (replication is good, but it needs more than one instance)
what's your idea?
While Change Data Capture (CDC) wasn't designed to be used as a sort of replication, we use it in this way at my company because it works for us. You enable CDC for the specific tables that you need to only get the net changes. The records are then stored in a database created by CDC. From there you can push the changes to the other database. You can find more information about CDC here.
Because it seems like you are looking for a solution that is only replicating the data one way, can I assume that the second source is read-only? If so, and because you said both databases are on the same instance, you can use synonyms in your secondary database.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

Incremental table names

I'm currently trying to find some information with a query that will automatically update on a website every 24 hours - this involves a number of columns in a daily backup table that I need to pull information from.
The problem I have is the query has to specifically state the database table, which makes the query rather static - and I need something a bit more dynamic.
I have a daily backup table with the naming system as follows:
daily_backup_130328
daily_backup_130329
daily_backup_130330
daily_backup_130331
daily_backup_130401
daily_backup_130402
So when I state my FROM table, I name one of these - usually the latest one available (so "daily_backup_0402" from the example list). Currently the only way I can get this to update is to manually go in and update the query every day before the scheduled run.
My question is: is there a way that I can get it to select the latest "daily_backup_??????" table automatically?
Edit: I'm on about bog standard queries like "SELECT * FROM daily_backup_130402" ORDER BY CheeseType ASC;
A generic answer: you could implement a daily_backup SP.
Then, for a specified server/driver, there should be specific catalog and dynamic query facilities.
There are chances that the SW that fills daily_backup_NNNNNN could also update the stored procedure, then relieving you at all from the manual burden. Again, that could be done dynamically, depending on server details (triggers on metadata).

Resources