How Tables are getting updated in bigquery - google-app-engine

May I know any method or way to check how the table is getting updated in bigquery ?
Scenario -
Data coming from google firebase dumped into table X in bigquery. Once the table X got completed it migrate the entire data to table Y. After creation of table Y it shouldn't be updated but it
is getting updated.
I couldn't find any document or way to find out how my table is getting updated.
Please let me know if you require any more info.
Thanks

Go to Cloud Logging, filter on the resource BigQuery and you will be able to view something like this:
The type of this log is Audit log (3rd line). You can also see WHO has performed the action, here, it's me, as PrincipalEmail (line 7).
Then, you have details on the operation. Here a select, but you can see if it's an insert, update, or stuff like this. The query capacity of Cloud Logging is very powerful!!

Related

How Do I find what is populating a table?

I constantly run into this problem. I am working in a data warehouse and I cannot find out what is populating a table. Typically the table is being populated on a daily basis from either other table in the warehouse or from an Oracle database. I have tried the below query and can confirm the updates, but i cannot see what is doing it. I searched to the known SSIS package and stored procedure with similar names and SQL jobs but I can find nothing.
select object_name(object_id) as DatabaseName, last_user_update, *
from sys.dm_db_index_usage_stats
where database_id = DB_ID('Warehouse')
and object_id=object_id('PAYMENTS_DAILY')
I only have the most basic SQL Server tools available so no fancy search tools :(
There is no way to tell, after data has been inserted into a data, where the data came from without having some sort of logging.
SSIS has logging, you can use triggers on the tables, change data capture, audit columns, etc. are the many ways to do this.
Frequently, if you know when the row was added, that can help you figure out what process is adding it. Add a new "InsertedDatetime" column to your warehouse table and give it a default value of getdate(). If you know that the rows always come in at 11:15 AM, you can use that to narrow your search.
That will probably be enough information, but if that doesn't help you track down the process, then you can add additional columns that contain everything from a source IP address to a calling object name.
As a last resort, you could rename your table and create a view named the same and then use an Instead Of Insert trigger on it that just holds open the connection so you can examine the currently executing processes to figure out where it's coming from.
I bet you can figure it out from the time alone though.

Solr DataImportHandler delta import

I am using DataImportHandler for indexing data in SOLR. I used full-import to index all the data in the my database which is around 10000 products.Now I am confused with the delta-import usage? Does it index the new data added into the database on interval basis i mean it is going to index the new data added to my table around 10 rows or it just updates the changes in the already indexed data.
Can anyone please explain it to me with simple example as soon as you can.
The DataImportHandler can be a little daunting. Your initial query has loaded 10.000 unique products. This is loaded if you specify /dataimport?command=full-import.
When this import is done, the DIH stores a variable ({dataimporter.last_index_time}) which is the last date/time you did this import.
In order to do an update, you specify a deltaQuery. The deltaQuery is meant to identify the records that have changed in your database since the last update. So, you specify a query like this: SELECT product_id
FROM sometable
WHERE [date_update] >= '${dataimporter.last_index_time}'
This will retrieve all the product_ids from your database that are updated since you last full update. The next query (deltaImportQuery) you need to specify is the query that will retrieve the full record for each product_id that you have from the previous step.
Assuming product_id is you unique key, solr will figure out that it needs to update an existing record, or add one if the product_id doens't work.
In order to execute the deltaQuery and the deltaImportQuery you use /dataimport?command=delta-import
This is a great simplification of all the possibilities, check the Solr wiki on DataImportHandler, it is a VERY powerful tool!
On another note:
When you use a delta import within a small time window (like a couple of times in a few seconds) and the database server is on an other machine than the solr index service, make sure that the systemtime of both machines matches, since the timestamp of [date_update] is generated on the database server and dataimporter.last_index_time is generated on the other.
Otherwise you won't be updating the index (or too much) depending on the time differences.
I agree that the Data Import Handler can handle this situation. One important limitation to the DIH is that it does not queue requests. The result of this is that if the DIH is "busy" indexing it will ignore all future DIH requests until it is "idle" again. The skipped DIH requests are lost and not executed.

How can I find out where a database table is being populated from?

I'm in charge of an Oracle database for which we don't have any documentation. At the moment I need to know how a table is getting populated.
How can I find out which procedure, trigger, or other source, this table is getting its data from?
Or even better, query the DBA_DEPENDENCIES table (or its equivalent USER_ ). You should see what objects are dependent on them and who owns them.
select owner, name, type, referenced_owner
from dba_dependencies
where referenced_name = 'YOUR_TABLE'
And yeah, you need to see through the objects to see whether there is an INSERT happening in.
Also this, from my comment above.
If it is not a production system, I would suggest you to raise an user
defined exception in TRIGGER- before INSERT with some custom message
or LOCK the table from INSERT and watch over the applications which
try inserting into them failing. But yeah, you might also get calls
from many angry people.
It is quite simple ;-)
SELECT * FROM USER_SOURCE WHERE UPPER(TEXT) LIKE '%NAME_OF_YOUR_TABLE%';
In output you'll have all procedures, functions, and so on, that in ther body invoke your table called NAME_OF_YOUR_TABLE.
NAME_OF_YOUR_TABLE has to be written UPPERCASE because we are using UPPER(TEXT) in order to retrieve results as Name_Of_Your_Table, NAME_of_YOUR_table, NaMe_Of_YoUr_TaBlE, and so on.
Another thought is to try querying v$sql to find a statement that performs the update. You may get something from the module/action (or in 10g progam_id and program_line#).
DML changes are recorded in *_TAB_MODIFICATIONS.
Without creating triggers you can use LOG MINER to find all data changes and from which session.
With a trigger you can record SYS_CONTEXT variables into a table.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions165.htm#SQLRF06117
Sounds like you want to audit.
How about
AUDIT ALL ON ::TABLE::;
Alternatively apply DBMS_FGA policy on the table and collect the client, program, user, and maybe the call stack would be available too.
Late to the party!
I second Gary's mention of v$sql also. That may yield the quick answer as long as the query hasn't been flushed.
If you know its in your current instance, I like a combination of what has been used above; if there is no dynamic SQL, xxx_Dependencies will work and work well.
Join that to xxx_Source to get that pesky dynamic SQL.
We are also bringing data into our dev instance using the SQL*Plus copy command (careful! deprecated!), but data can be introduced by imp or impdp as well. Check xxx_Directories for the directories blessed to bring data in/out.

How to track data changes in a database table

What is the best way to track changes in a database table?
Imagine you got an application in which users (in the context of the application not DB users ) are able to change data which are store in some database table. What's the best way to track a history of all changes, so that you can show which user at what time change which data how?
In general, if your application is structured into layers, have the data access tier call a stored procedure on your database server to write a log of the database changes.
In languages that support such a thing aspect-oriented programming can be a good technique to use for this kind of application. Auditing database table changes is the kind of operation that you'll typically want to log for all operations, so AOP can work very nicely.
Bear in mind that logging database changes will create lots of data and will slow the system down. It may be sensible to use a message-queue solution and a separate database to perform the audit log, depending on the size of the application.
It's also perfectly feasible to use stored procedures to handle this, although there may be a bit of work involved passing user credentials through to the database itself.
You've got a few issues here that don't relate well to each other.
At the basic database level you can track changes by having a separate table that gets an entry added to it via triggers on INSERT/UPDATE/DELETE statements. Thats the general way of tracking changes to a database table.
The other thing you want is to know which user made the change. Generally your triggers wouldn't know this. I'm assuming that if you want to know which user changed a piece of data then its possible that multiple users could change the same data.
There is no right way to do this, you'll probably want to have a separate table that your application code will insert a record into whenever a user updates some data in the other table, including user, timestamp and id of the changed record.
Make sure to use a transaction so you don't end up with cases where update gets done without the insert, or if you do the opposite order you don't end up with insert without the update.
One method I've seen quite often is to have audit tables. Then you can show just what's changed, what's changed and what it changed from, or whatever you heart desires :) Then you could write up a trigger to do the actual logging. Not too painful if done properly...
No matter how you do it, though, it kind of depends on how your users connect to the database. Are they using a single application user via a security context within the app, are they connecting using their own accounts on the domain, or does the app just have everyone connecting with a generic sql-account?
If you aren't able to get the user info from the database connection, it's a little more of a pain. And then you might look at doing the logging within the app, so if you have a process called "CreateOrder" or whatever, you can log to the Order_Audit table or whatever.
Doing it all within the app opens yourself up a little more to changes made from outside of the app, but if you have multiple apps all using the same data and you just wanted to see what changes were made by yours, maybe that's what you wanted... <shrug>
Good luck to you, though!
--Kevin
In researching this same question, I found a discussion here very useful. It suggests having a parallel table set for tracking changes, where each change-tracking table has the same columns as what it's tracking, plus columns for who changed it, when, and if it's been deleted. (It should be possible to generate the schema for this more-or-less automatically by using a regexed-up version of your pre-existing scripts.)
Suppose I have a Person Table with 10 columns which include PersonSid and UpdateDate. Now, I want to keep track of any updates in Person Table.
Here is the simple technique I used:
Create a person_log table
create table person_log(date datetime2, sid int);
Create a trigger on Person table that will insert a row into person_log table whenever Person table gets updated:
create trigger tr on dbo.Person
for update
as
insert into person_log(date, sid) select updatedDTTM, PersonSID from inserted
After any updates, query person_log table and you will be able to see personSid that got updated.
Same you can do for Insert, delete.
Above example is for SQL, let me know in case of any queries or use this link :
https://web.archive.org/web/20211020134839/https://www.4guysfromrolla.com/webtech/042507-1.shtml
A trace log in a separate table (with an ID column, possibly with timestamps)?
Are you going to want to undo the changes as well - perhaps pre-create the undo statement (a DELETE for every INSERT, an (un-) UPDATE for every normal UPDATE) and save that in the trace?
Let's try with this open source component:
https://tabledependency.codeplex.com/
TableDependency is a generic C# component used to receive notifications when the content of a specified database table change.
If all changes from php. You may use class to log evry INSERT/UPDATE/DELETE before query. It will be save action, table, column, newValue, oldValue, date, system(if need), ip, UserAgent, clumnReference, operatorReference, valueReference. All tables/columns/actions that need to log are configurable.

SQL Server 2000: Is there a way to tell when a record was last modified?

The table doesn't have a last updated field and I need to know when existing data was updated. So adding a last updated field won't help (as far as I know).
SQL Server 2000 does not keep track of this information for you.
There may be creative / fuzzy ways to guess what this date was depending on your database model. But, if you are talking about 1 table with no relation to other data, then you are out of luck.
You can't check for changes without some sort of audit mechanism. You are looking to extract information that ha not been collected. If you just need to know when a record was added or edited, adding a datetime field that gets updated via a trigger when the record is updated would be the simplest choice.
If you also need to track when a record has been deleted, then you'll want to use an audit table and populate it from triggers with a row when a record has been added, edited, or deleted.
You might try a log viewer; this basically just lets you look at the transactions in the transaction log, so you should be able to find the statement that updated the row in question. I wouldn't recommend this as a production-level auditing strategy, but I've found it to be useful in a pinch.
Here's one I've used; it's free and (only) works w/ SQL Server 2000.
http://www.red-gate.com/products/SQL_Log_Rescue/index.htm
You can add a timestamp field to that table and update that timestamp value with an update trigger.
OmniAudit is a commercial package which implments auditng across an entire database.
A free method would be to write a trigger for each table which addes entries to an audit table when fired.

Resources