Find the source that fired a query

Find the source that fired a query - sql-server

This is a hypothetical question - the problem listed below is entirely fictional, but I believe if anyone has an answer it could prove useful for future reference.
We have a situation wherein multiple systems all populate the same data table on our SQL Server. One of these systems seems to be populating the table incorrectly, albeit in a consistent pattern (leading me to believe it is only a bug in a single system, not multiple) These are majoritively third-party systems and we do not have access to modify or view their source code, nor alter their functionality. We want to file a bug report with the culprit system's developer, but we don't know which one it is as the systems leave no identifiable trace on the table - those in charge before me, when the database was new and only occasionally used by a single system, believed that a single timestamp field was an adequate audit, and this has never been reconsidered.
Our solution has to be entirely SQL-based. Our thought was to write a trigger on the table, and somehow pull through the source of the query - ie, where it came from - but we don't know how, or even if that's possible.
There are some clear solutions to this - for eg contact all the developers to update their software to populate a new software_ID field, and then use the new information to identify the faulty system later (and save my fictional self similar headaches later) - but I'm particularly interested to know if there's anything that could be done purely in-house on SQL Server (or another clever solution) with the restrictions noted.

you can use functions:
select HOST_NAME(), APP_NAME()
So you will know the computer and application that caused the changes..
And you can modify application connection string to add custom Application name, for example:
„Data Source=SQLServerExpress;Initial Catalog=TestDB;
Integrated Security=True; Application Name=MyProgramm”

You could create a copy of the table in question with one additional nvarchar field to hold the identifier.
Then create a trigger for insert (and maybe update) on the table, and in the trigger insert the same rows to the copy, adding in an identifier. The identifier could be for instance the login name on the connection:
insert into tableCopy select SUSER_SNAME(), inserted.* from inserted
or maybe a client IP:
declare #clientIp varchar(255);
SELECT clientIp = client_net_address
FROM sys.dm_exec_connections
WHERE Session_id = ##SPID
insert into tableCopy select #clientIp, inserted.* from inserted
or possibly something else that you could get from the connection context (for lack of a more precise term) that can identify the client application.
Make sure though that inserting into the table copy will under no circumstances cause errors. Primary keys and indexes should probably be dropped from the copy.

Just an idea: create a trigger that save in a dedicated table the info obtained by EXEC sp_who2 when suspicious value are stored in the table.
Maybe you can filter sp_who2 values by status RUNNABLE.
So, if multiple users share the same login, you can determine the exact moment in which the command is executed and start your research from this...

Related

Finding total number of tables in union query

I am writing a code which supports different versions of Sybase ASE. I am using union queries and the problem is that different version of Sybase ASE supports different number of tables in union query. The union query is dynamic and will be formed depending on the number of database present in the server.
Is there any way in which I can find the max number of tables supported by a particular Sybase ASE? The only solution that I know right now is to fetch the version using query and pick out the version number from the result and set the number accordingly in the code. But this is not a very good solution. I tried checking if there are any tables which have stores this value but nothing came up. Can anyone suggest any solution for this?

Since that's my SAP response you've re-posted here, I'll add some more notes ...
that was a proof of concept that answered the basic question of how to get the info via T-SQL; it was assumed anyone actually looking to implement the solution would (eventually) get around to addressing the various issues re: overhead/maintenance, eg ...
setting a tracefile is going to require permissions to do it; which permissions depends on whether or not you've got granular permissions enabled (see the notes for the 'set tracefile' command in the Reference manual); you'll need to decide if/how you want to grant the permissions to other users
while it's true you cannot re-use the tracefile, you can create a proxy table for the directory where the tracefile exists, then 'delete' the tracefile from the directory, eg:
create proxy_table tracedir external directory at '/tmp'
go
delete tracedir where filename = 'my_serverlmiits'
go
if you could have multiple copies of the proxy table solution running at the same time then you'll obviously (?) need to make sure you generate a unique tracefile name for each session; while you could do this by appending ##spid to the file name, you could also add the login name (suser_name()), the kpid (select KPID from master..monProcess where SPID = ##spid), etc; you'll also want to make sure such a file doesn't exist before trying to create it (eg, delete tracedir where filename = '.....'; set tracefile ...)
your error (when selecting from the proxy table) appears to be related to your client application running in transaction isolation level 0 (which, by default, requires a unique index on the table ... not something you're going to accomplish against a proxy table pointing to an OS file); try setting your isolation level to 1, or use a client application that doesn't default to isolation level 0 (eg, that example runs fine with the basic isql command line tool)
if this solution were to be productionalized then you'll probably want to get a separate filesystem allocated so that any 'run away' tracing sessions don't fill up an important filesystem (eg, /var, /tmp, $SYBASE, etc)
also from a production/security perspective, I'd probably want to investigate the possibility of encapsulating a lot of the details in a DBA/system proc (created to execute under the permissions of the creator) so as to ensure developers can't create tracefiles in the 'wrong' directories ... and on and on and on re: control/security ...
Then again ...
If you're going to be doing this a LOT ... and you're only interested in the max number of tables in a (union) query, then it'd probably be much easier to just build a static if/then/else (or case) expression that matches your ASE version with the few possible numbers (see RobV's post).
Let's face it, how often are really, Really, REALLY going to be building a query with more than, say, 100 tables, let alone 500, 1000, more? [You really don't want to deal with trying to tune such a monster!! YIKES] Realistically speaking, I can't see any reason why you'd want to productionalize the proxy table solution just to access a single row from dbcc serverlimits when you could just implement a hard limit (eg, max of 100 tables).
And the more I think about it, as a DBA I'm going to do whatever I can to make sure your application can't create some monster, multi-hundred table query that ends up bogging down my dataserver simply because the developer couldn't come up with a more efficient solution. [And heaven forbid this type of application gets rolled out to the general user community, ie, I'd have to deal with dozens/hundreds of copies of this monster running in my dataserver?!?!?!]

You can get such limits by running 'dbcc serverlimits' (enable traceflag 3604 first).
Up until version 15.7, the maximum was 256.
In 16.0, this was raised to 512.
In 16.0 SP01, this was raised again to 1023.

I suggest you open a case/ticket with SAP support to know if there is any system tables that store this information. If there is none, I would implement the tedious solution you mentionned and will monitor the following error in the ASE15.7 logs:
CR 805525 -- If you exceed the number of tables in a UNION query you can get a signal 11 in ord_getrowbounds instead of an error message.

This is the answer that I got from the SAP community
-- enable trace file for your spid
set tracefile '/tmp/my_serverlimits' for ##spid
go
-- dump dbcc serverlimits output to your tracefile
dbcc serverlimits
go
-- turn off tracing
set tracefile off for ##spid
go
-- enable external file access:
sp_configure 'enable file access',1
go
-- create proxy table pointing at the trace file
create proxy_table dbcc_serverlimits external file at '/tmp/my_serverlimits'
go
-- find our column name ('record' of type varchar(255) in this case)
sp_help dbcc_serverlimits
go
-- extract the desired row; store the 'record' value in a #variable
-- and parse for the desired info ...
select * from dbcc_serverlimits where lower(record) like '%union%'
go
record
------------------------------------------------------------------------
Max number of user tables overall in a statement using UNIONs : 512
There are some problems with this approach though. First issue is setting trace file. I am going to use this code mostly daily and in Sybase, I think we can't delete or overwrite a trace file. Second is regarding the proxy table. Proxy table will have to be deleted, but this can be taken care with the following code
IF
exists (select 1 from
sysobjects where type = 'U' and name = 'dbcc_serverlimits')
begin
drop table
dbcc_serverlimits
end
go
Final problem comes when a select query is made from dbcc_serverlimits table. It throws the following error
Could not execute statement. The optimizer could not find a unique
index which it could use to scan table 'dbo.dbcc_serverlimits' for
cursor 'jconnect_implicit_26'. SQLCODE=311 Server=************,
Severity Level=16, State=2, Transaction State=1, Line=1 Line 24
select * from dbcc_serverlimits
All this command will have to be written in procedure (that is what I am thinking). Any more elegant solution?

How can I refer to a database deployed from the same DACPAC file, but with a different db name?

Background
I have a multi-tenant scenario and a unique Sql Server project that will be deployed into multiple databases instances on the same server. There will be one db for each tenant, plus one "model" db.
The "model" database serves three purposes:
Force some "system" data to be always present in each tenant database
Serves as an access point for users with a special permission to edit system data (which will be punctually synced to all tenants)
When creating a new tenant, the database will be copied and attached with a new name representing the tenant
There are triggers that checks if the modified / deleted data within tenant db corresponds to "system" data inside the "model" db. If it does, an error is raised saying that system data cannot be altered.
Issue
So here's a part of the trigger that checks if deletion can be allowed:
IF DB_NAME() <> 'ModelTenant' AND EXISTS
(
SELECT
[deleted].*
FROM
[deleted]
INNER JOIN [---MODEL DB NAME??? ---].[MySchema].[MyTable] [ModelTable]
ON [deleted].[Guid] = [ModelTable].[Guid]
)
BEGIN;
THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1
END
I can't seem to find what should take the place of --- MODEL DB NAME??? --- in the above code that would allow the project to compile properly. When refering to a completely different project I know what to do: use a reference to the project that's represented with an SQLCMD variable. But in this scenario the project reference is essentially the same project; only on a different database. I can't seem to be able to add a self-reference in this manner.
What can I do? Does SSDT offers some kind of support for such a scenario?

Have you tried setting up a Database Variable? You can read under "Reference aware statements" here. You could then say:
SELECT * FROM [$(MyModelDb)][MySchema].[MyTable] [ModelTable]
If you don't have a specific project for $(MyModelDb) you can choose the option to "suppress errors by unresolved references...". It's been forever since I've used SSDT projects, but I think that should work.
TIP: If you need to reference 1-table 100-times, you may find it better to create a SYNONM that uses the database variable, then point to the SYNONM in your SPROCs/TRIGGERs. Why? Because that way you don't need to deploy your SPROCs/TRIGGERs to get the variable replaced with the actual value and that can make development smoother.

I'm not quite sure if SSDT is particularly well-suited to projects of any decent amount of complexity. I can think of one or two ways to most likely accomplish this (especially depending on exactly how you do the publishing / deployment), but I think you would actually lose more than you gain. What I mean by that is: you could add steps to get this to work (i.e. win the battle), but you would be creating a more complex system in order to get SSDT to publish a system that is more complex (and slower) than it needs to be (i.e. lose the war).
Before worrying about SSDT, let's look at why you need/want SSDT to do this in the first place. You have system data intermixed with tenant data, and you need to validate UPDATE and DELETE operations to ensure that the system data does not get modified, and the only way to identify data that is "system" data is by matching it to a home-of-record -- ModelDB -- based on GUID PKs.
That theory on identifying what data belongs to the "system" and not to a tenant is your main problem, not SSDT. You are definitely on the right track for a multi-tentant system by having the "model" database, but using it for data validation is a poor design choice: on top of the performance degradation already incurred from using GUIDs as PKs, you are further slowing down all of these UPDATE and DELETE operations by funneling them through a single point of contention since all client DBS need to check this common source.
You would be far better off to include a BIT field in each of these tables that mixes system and tenant data, denoting whether the row was "system" or not. Just look at the system catalog views within SQL Server:
sys.objects has an is_ms_shipped column
sys.assemblies went the other direction and has an is_user_defined column.
So, if you were to add an [IsSystemData] BIT NOT NULL column to these tables, your Trigger logic would become:
IF DB_NAME() <> N'ModelTenant' AND EXISTS
(
SELECT del.*
FROM [deleted] del
WHERE del.[IsSystemData] = 1
)
BEGIN
;THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1;
END;
Benefits:
No more SSDT issue (at least for from this part ;-)
Faster UPDATE and DELETE operations
Less contention on the shared resource (i.e. ModelDB)
Less code complexity

As an alternative to referencing another database project, you can produce a dacpac, then reference the dacpac as a database reference in "same server, different database" mode.

For Oracle Database How to find when the row was inserted? (timestamp) [duplicate]

Can I find out when the last INSERT, UPDATE or DELETE statement was performed on a table in an Oracle database and if so, how?
A little background: The Oracle version is 10g. I have a batch application that runs regularly, reads data from a single Oracle table and writes it into a file. I would like to skip this if the data hasn't changed since the last time the job ran.
The application is written in C++ and communicates with Oracle via OCI. It logs into Oracle with a "normal" user, so I can't use any special admin stuff.
Edit: Okay, "Special Admin Stuff" wasn't exactly a good description. What I mean is: I can't do anything besides SELECTing from tables and calling stored procedures. Changing anything about the database itself (like adding triggers), is sadly not an option if want to get it done before 2010.

I'm really late to this party but here's how I did it:
SELECT SCN_TO_TIMESTAMP(MAX(ora_rowscn)) from myTable;
It's close enough for my purposes.

Since you are on 10g, you could potentially use the ORA_ROWSCN pseudocolumn. That gives you an upper bound of the last SCN (system change number) that caused a change in the row. Since this is an increasing sequence, you could store off the maximum ORA_ROWSCN that you've seen and then look only for data with an SCN greater than that.
By default, ORA_ROWSCN is actually maintained at the block level, so a change to any row in a block will change the ORA_ROWSCN for all rows in the block. This is probably quite sufficient if the intention is to minimize the number of rows you process multiple times with no changes if we're talking about "normal" data access patterns. You can rebuild the table with ROWDEPENDENCIES which will cause the ORA_ROWSCN to be tracked at the row level, which gives you more granular information but requires a one-time effort to rebuild the table.
Another option would be to configure something like Change Data Capture (CDC) and to make your OCI application a subscriber to changes to the table, but that also requires a one-time effort to configure CDC.

Ask your DBA about auditing. He can start an audit with a simple command like :
AUDIT INSERT ON user.table
Then you can query the table USER_AUDIT_OBJECT to determine if there has been an insert on your table since the last export.
google for Oracle auditing for more info...

SELECT * FROM all_tab_modifications;

Could you run a checksum of some sort on the result and store that locally? Then when your application queries the database, you can compare its checksum and determine if you should import it?
It looks like you may be able to use the ORA_HASH function to accomplish this.
Update: Another good resource: 10g’s ORA_HASH function to determine if two Oracle tables’ data are equal

Oracle can watch tables for changes and when a change occurs can execute a callback function in PL/SQL or OCI. The callback gets an object that's a collection of tables which changed, and that has a collection of rowid which changed, and the type of action, Ins, upd, del.
So you don't even go to the table, you sit and wait to be called. You'll only go if there are changes to write.
It's called Database Change Notification. It's much simpler than CDC as Justin mentioned, but both require some fancy admin stuff. The good part is that neither of these require changes to the APPLICATION.
The caveat is that CDC is fine for high volume tables, DCN is not.

If the auditing is enabled on the server, just simply use
SELECT *
FROM ALL_TAB_MODIFICATIONS
WHERE TABLE_NAME IN ()

You would need to add a trigger on insert, update, delete that sets a value in another table to sysdate.
When you run application, it would read the value and save it somewhere so that the next time it is run it has a reference to compare.
Would you consider that "Special Admin Stuff"?
It would be better to describe what you're actually doing so you get clearer answers.

How long does the batch process take to write the file? It may be easiest to let it go ahead and then compare the file against a copy of the file from the previous run to see if they are identical.

If any one is still looking for an answer they can use Oracle Database Change Notification feature coming with Oracle 10g. It requires CHANGE NOTIFICATION system privilege. You can register listeners when to trigger a notification back to the application.

Please use the below statement
select * from all_objects ao where ao.OBJECT_TYPE = 'TABLE' and ao.OWNER = 'YOUR_SCHEMA_NAME'

How can I find out where a database table is being populated from?

I'm in charge of an Oracle database for which we don't have any documentation. At the moment I need to know how a table is getting populated.
How can I find out which procedure, trigger, or other source, this table is getting its data from?

Or even better, query the DBA_DEPENDENCIES table (or its equivalent USER_ ). You should see what objects are dependent on them and who owns them.
select owner, name, type, referenced_owner
from dba_dependencies
where referenced_name = 'YOUR_TABLE'
And yeah, you need to see through the objects to see whether there is an INSERT happening in.
Also this, from my comment above.
If it is not a production system, I would suggest you to raise an user
defined exception in TRIGGER- before INSERT with some custom message
or LOCK the table from INSERT and watch over the applications which
try inserting into them failing. But yeah, you might also get calls
from many angry people.

It is quite simple ;-)
SELECT * FROM USER_SOURCE WHERE UPPER(TEXT) LIKE '%NAME_OF_YOUR_TABLE%';
In output you'll have all procedures, functions, and so on, that in ther body invoke your table called NAME_OF_YOUR_TABLE.
NAME_OF_YOUR_TABLE has to be written UPPERCASE because we are using UPPER(TEXT) in order to retrieve results as Name_Of_Your_Table, NAME_of_YOUR_table, NaMe_Of_YoUr_TaBlE, and so on.

Another thought is to try querying v$sql to find a statement that performs the update. You may get something from the module/action (or in 10g progam_id and program_line#).

DML changes are recorded in *_TAB_MODIFICATIONS.
Without creating triggers you can use LOG MINER to find all data changes and from which session.
With a trigger you can record SYS_CONTEXT variables into a table.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions165.htm#SQLRF06117

Sounds like you want to audit.
How about
AUDIT ALL ON ::TABLE::;
Alternatively apply DBMS_FGA policy on the table and collect the client, program, user, and maybe the call stack would be available too.

Late to the party!
I second Gary's mention of v$sql also. That may yield the quick answer as long as the query hasn't been flushed.
If you know its in your current instance, I like a combination of what has been used above; if there is no dynamic SQL, xxx_Dependencies will work and work well.
Join that to xxx_Source to get that pesky dynamic SQL.
We are also bringing data into our dev instance using the SQL*Plus copy command (careful! deprecated!), but data can be introduced by imp or impdp as well. Check xxx_Directories for the directories blessed to bring data in/out.

What should be returned when inserting into SQL?

A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?

We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.

The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.

Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)

Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.

INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI

I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight