Stored procedures reverse engineering - database

We're having problem with a huge number of legacy stored procedures at work. Do you guys recommend any tool the can help better understand those procedures? Some kind of reverse engineering that indentifies inter-procedure dependencies and/or procedure vs. tables dependencies. Can be a free or commercial tool.
Thanks!

The cheaper solution than 'dependency tracker' is the data dictionary table sys.sql_dependencies which from which this data can be queried from the data dictionary. Oracle has a data dictionary view with similar functionality called DBA_DEPENDENCIES (plus equivalent USER_ and ALL_ views) . Using the other data dictionary tables (sys.tables/DBA_TABLES) etc. you can generate object dependency reports.
If you're feeling particularly keen you can use a recursive query (Oracle CONNECT BY or SQL Server Common Table Expressions) to build a complete object dependency graph.
Here's an example of a recursive CTE on sys.sql_dependencies. It will return an entry for every dependency with its depth. Items can occur more than once, possibly at different depths, for every dependency relationship. I don't have a working Oracle instance to hand to build a CONNECT BY query on DBA_DEPENDENCIES so anyone with edit privileges and the time and expertise is welcome to annotate or edit this answer.
Note also with sys.sql_dependencies that you can get column references from referenced_minor_id. This could be used (for example) to determine which columns were actually used in the ETL sprocs from a staging area with copies of the DB tables from the source with more columns than are actually used.
with dep_cte as (
select o2.object_id as parent_id
,o2.name as parent_name
,o1.object_id as child_id
,o1.name as child_name
,d.referenced_minor_id
,1 as hierarchy_level
from sys.sql_dependencies d
join sys.objects o1
on o1.object_id = d.referenced_major_id
join sys.objects o2
on o2.object_id = d.object_id
where d.referenced_minor_id in (0,1)
and not exists
(select 1
from sys.sql_dependencies d2
where d2.referenced_major_id = d.object_id)
union all
select o2.object_id as parent_id
,o2.name as parent_name
,o1.object_id as child_id
,o1.name as child_name
,d.referenced_minor_id
,d2.hierarchy_level + 1 as hierarchy_level
from sys.sql_dependencies d
join sys.objects o1
on o1.object_id = d.referenced_major_id
join sys.objects o2
on o2.object_id = d.object_id
join dep_cte d2
on d.object_id = d2.child_id
where d.referenced_minor_id in (0,1)
)
select *
from dep_cte
order by hierarchy_level
I've got this to open-up to the community now. Could someone with convenient access to a running Oracle instance post a CONNECT BY recursive query here? Note that this is SQL-server specific and the question owner has since made it clear that he's using Oracle. I don't have a running Oracle instance to hand to develop and test anything.

Redgate has a rather expensive product called SQL Dependency Tracker that seems to fulfill the requirements.

I think the Red Gate Dependency Tracker mentioned by rpetrich is a decent solution, it works well and Red Gate has 30 day trial (ideally long enough for you do do your forensics).
I would also consider isolating the system and running the SQL Profiler which will show you all the SQL action on the tables. This is often a good starting point for building a sequence diagram or however you choose to document these codes. Good luck!

Redgate SQL Doc. the generated documentation included cross-referenced dependency information. For example, for each table, it lists views, stored procedures, triggers etc that reference that table.

What database are the stored procedures in? Oracle, SQL Server, something else?
Edit based on comment: Given you're using Oracle then, have a look at TOAD. I use a feature in it called the Code Roadmap, which allows you to graphically display PL/SQL interdependancies within the database. It can run in Code Only mode, showing runtime call stack dependancies, or Code Plus Data mode, where it also shows you database objects (tables, views, triggers) that are touched by your code.
(Note - I am a TOAD user, and gain no benefit from referring it)

This isn't real deep or thorough, but I think that if you're using MS SQL Server or Oracle (Perhaps Nigel can help with a PL-SQL sample)...Nigel is on to something . This only goes 3 dependencies deep, but could be modified to go however deep you need. It's not the prettiest thing...but it's functional...
select
so.name + case when so.xtype='P' then ' (Stored Proc)' when so.xtype='U' then ' (Table)' when so.xtype='V' then ' (View)' else ' (Unknown)' end as EntityName,
so2.name + case when so2.xtype='P' then ' (Stored Proc)' when so2.xtype='U' then ' (Table)' when so2.xtype='V' then ' (View)' else ' (Unknown)' end as FirstDependancy,
so3.name + case when so3.xtype='P' then ' (Stored Proc)' when so3.xtype='U' then ' (Table)' when so3.xtype='V' then ' (View)' else ' (Unknown)' end as SecondDependancy,
so4.name + case when so4.xtype='P' then ' (Stored Proc)' when so4.xtype='U' then ' (Table)' when so4.xtype='V' then ' (View)' else ' (Unknown)' end as ThirdDependancy
from
sysdepends sd
inner join sysobjects as so on sd.id=so.id
left join sysobjects as so2 on sd.depid=so2.id
left join sysdepends as sd2 on so2.id=sd2.id and so2.xtype not in ('S','PK','D')
left join sysobjects as so3 on sd2.depid=so3.id and so3.xtype not in ('S','PK','D')
left join sysdepends as sd3 on so3.id=sd3.id and so3.xtype not in ('S','PK','D')
left join sysobjects as so4 on sd3.depid=so4.id and so4.xtype not in ('S','PK','D')
where so.xtype = 'P' and left(so.name,2)<>'dt'
group by so.name, so2.name, so3.name, so4.name, so.xtype, so2.xtype, so3.xtype, so4.xtype

How to find the dependency chain of a database object (MS SQL Server 2000(?)+)
by Jacob Sebastian
Every time he needs to deploy a new report or modify an existing
report, he needs to know what are the database objects that depend on
the given report stored procedure. Some times the reports are very
complex and each stored procedure might have dozens of dependent
objects and each dependent object may be depending on other dozens of
objects.
He needed a way to recursively find all the depending objects of a
given stored procedure. I wrote a recursive query using CTE to achieve
this.

The single best tool for reverse engineering is by APEX. Its amazing. It can even trace into .NET assemblies and tell you where the procs are used. Its by far the deepest product of its kind. RedGate has great other tools but not in this case.

Related

Tip: how to get a list of objects in a SQL Server database

sometimes I have completed the development of a solution on a development instance of a SQL database, after several test cycles, reworks, adjustments, etc.
When is time to move everything in production, I have to be sure that nothing is left behind by mistake (a trigger, a stored procedure).
One helper is to get a simple list of all the objects in the database.
A simple query to get the list of all the objects in your db is the following one:
select * from Sys.objects as o left join sys.schemas as s on o.schema_id = s.schema_id
where is_ms_shipped = 0
order by s.name, o.type, o.name
You can adjust it at your needs, remove fields, and so on.
Few notes:
I have excluded items delivered by Microsoft (is_ms_shipped)
I have joined the main system table with a catalogue of all the objects with the description of the schema. This is helpful to order all the elements
Using SSMS (SQL Server Management Studio), you can easily export the results in Excel, and work on it, add it to the documentation.

legacy table, investigate how it gets updated?

I have a legacy database that is a mess. I need to investigate a specific table that gets synced/updated using several sources… I need to know when and how the table gets updated.
How can I retrieve all the sources used to update/sync this table? (I guess it’s mainly done through different jobs using SPs).
Is there a way to search in all SP for ‘%table name%’ ?! (is the only way I can think of, is there any other reasonable way?)
Then, I would just need to check which jobs are running those SP, and I could get a better picture…
This will generate list of all procs that refer to a object 'UserInfo':
> select object_name(object_id) from sys.sql_modules where
> charindex('userinfo',definition)>0
This will not search SSIS or BCP packages which typically are on the file system or in the MSDB database. Many times there are jobs that invoke BCP and/or SSIS packages that update data.
To inspect only procedures you can use:
> select object_name(sm.object_id) from sys.sql_modules sm inner join
> sys.objects so on sm.object_id=so.object_id where
> charindex('userinfo',definition)>0 and type='P'
You could try this approach too (looking for ones where is_updated is 1) but I would combine it with other approaches as I haven't found this to be 100% reliable.
DECLARE #TwoPartName nvarchar(500) = '[dbo].[YourTable]';
SELECT referencing_schema_name,
referencing_entity_name,
MAX(0 + is_selected) is_selected,
MAX(0 + is_updated) is_updated
FROM sys.dm_sql_referencing_entities (#TwoPartName, 'OBJECT')
CROSS APPLY sys.dm_sql_referenced_entities (QUOTENAME(referencing_schema_name) + '.'
+ QUOTENAME(referencing_entity_name), 'OBJECT') CA2
WHERE CA2.referenced_id = OBJECT_ID(#TwoPartName)
GROUP BY referencing_schema_name,
referencing_entity_name;
You can start by generating the "CREATE" scripts via the "Tasks / Generate Scripts..." command ...
...for the stored procedures and functions defined in your database.
Then you can search the generated SQL files to see where your table of interest is referenced.
This will not indicate anything about applications with internal SQL that updates your table, but it is a good start for your analysis.

How do I find all stored procs and functions that changes data in a given table?

It is easy to find all stored procs that “depends” on a given table by using Juneau (CTP3) or SQL Dependency Tracker (from RedGate).
However we have 100s of stored procs that just select from the given table, and hence make it very time consuming to look at the results from Juneau.
I need to find the procs that insert/update/delete data from the table.
(A search with a complex regex, is not a solution that will work!)
With the same caveats as Christian, that there isn't really a way to be 100% certain that a stored procedure updates your table and not another, this method has a couple of improvements:
it uses sys.sql_modules, so no chance of missing a hit due to a boundary, or not capturing all of the text, for procs > 4k
it doesn't parse the object text for the table name, which can lead to a lot of false positives (table name in comments only, table name is part of a larger name)
it generates an sp_helptext command for each potential match, so you can copy & paste the output into the top pane, run it, and quickly scan to figure out if there are any false positives.
Code:
SELECT 'EXEC sp_helptext '''
+ QUOTENAME(SCHEMA_NAME(p.[schema_id]))
+ '.' + QUOTENAME(p.name) + ''';'
FROM sys.procedures AS p
INNER JOIN sys.sql_modules AS m
ON p.[object_id] = m.[object_id]
INNER JOIN sys.sql_expression_dependencies AS d
ON p.[object_id] = d.referencing_id
WHERE d.referenced_id = OBJECT_ID('dbo.your_table_name')
AND
(
LOWER(m.[definition]) LIKE '%update%'
OR LOWER(m.[definition]) LIKE '%insert%'
OR LOWER(m.[definition]) LIKE '%delete%'
);
Now one weakness is that sys.sql_expression_dependencies isn't 100% dependable - but I'd still prefer to do it this way for the above reasons.
I wrote a pretty lengthy article about maintaining dependencies a while back:
Keeping sysdepends up to date in SQL Server 2008
You can query the system views for that.
Here is an example how to find all SPs which are related to a certain table.
With a bit of modification, you can find only those that actually contain the keywords delete, insert and update:
SELECT DISTINCT so.name, sc.text
FROM syscomments sc
INNER JOIN sysobjects so ON sc.id=so.id
WHERE (sc.TEXT LIKE '%your_table%' AND sc.TEXT LIKE '%delete%')
OR (sc.TEXT LIKE '%your_table%' AND sc.TEXT LIKE '%insert%')
OR (sc.TEXT LIKE '%your_table%' AND sc.TEXT LIKE '%update%')
This is not a perfect solution (for example, it will also find SPs which SELECT from your table and DELETE from another), but if you have hundreds of SPs which only SELECT from your table and do nothing else, at least these will be filtered out.
In the end, I just did a string search over the folders that store the master copy of the stored procs. It took a few hours to look at all the hits, but that was quicker than trying to write my own tool.
(I don’t understand why the tooling for SQL is so limited compared to C# for this sort of thing)
**
Try this SP_Helptrigger 'table name' sp_helptext'triggername' see the
code and then view the manipulation part in the trigger
**

How to stop stored procs from whining about a missing column that I am about to delete in SQL Server 2008?

I am deleting a column from one of the frequently used tables in my database.
Last time I did this errors started to crawl up from all sorts of stored procs that I had long forgotten existed; complaining about the missing column. (only when that stored proc was called)
so, I dont want to get winded up with these things again, I want to make sure all stored procs are free of that column before I actually delete it.
What is the best way to search through all stored procs (and I have quite a lot of them) and remove the reference to that column?
I tried to find an option in the menu to do this but I did not find anything very obvious to do this.
any help, (other than telling me to go through them all one by one) is appreciated.
ps: of course, doesnt mean that I will depreciate your comment if you do tell me. I will only downvote :P
(nah, just kidding!)
To add to the various TSQL solutions, there is a free tool from Red Gate that integrates into SSMS: SQL Search
Use this script. It will also return triggers. If many tables has column with the same name you can add tale name to the where too. This script works on MSSQL 2000, 2005. I haven't tested it on 2008, but it should work fine too.
SELECT o.name
FROM sysobjects o
INNER JOIN syscomments c ON o.id = c.id
WHERE c.text like '%column_name%'
Edit: If you want to filter it only to store procedures add AND type ='P' to the where clause
Red Gate Software's SQL Prompt 5 has a couple of new features that might be useful in this situation:
Column Dependencies: hover over a column name in a script and up pops a window containing a list of all the objects that use that column
Find Invalid Objects: show objects across the database that can't be used, often because they use columns that have been deleted
You can download a 14-day free trial to see if the tool would be useful for you.
Paul Stephenson
SQL Prompt Project Manager
Red Gate Software
You can use Dependence option for that table to find the Dependent object or list of Procedure or function which are depend on this table.
Use below script
sp_depends 'TableName'
another option is create script for that column containing but that will filter all the text in the procedure or function.
EDIT: sorry, my bad. here's the code for searching within the stored procedure's code
The following stored procedure should be able to list all the stored procedures whose text contain the desired string (so, place your column name in it and fire away):
CREATE PROCEDURE Find_Text_In_SP
#StringToSearch varchar(100)
AS
SET #StringToSearch = '%' +#StringToSearch + '%'
SELECT Distinct SO.Name
FROM sysobjects SO (NOLOCK)
INNER JOIN syscomments SC (NOLOCK) on SO.Id = SC.ID
AND SO.Type = 'P'
AND SC.Text LIKE #stringtosearch
ORDER BY SO.Name
GO
Usage:
exec Find_Text_In_SP 'desired_column_name'
Source here
If you use MS SQL later than version 2000, it's better to search sys.sql_modules rather than sys.syscomments, since syscomments only hold records of nvarchar(4000), and the text you are looking for may be split into two records.
So while you can use a query like this from MSDN
SELECT sm.object_id, OBJECT_NAME(sm.object_id) AS object_name, o.type, o.type_desc, sm.definition
FROM sys.sql_modules AS sm
JOIN sys.objects AS o ON sm.object_id = o.object_id
WHERE sm.definition like '%' + #ColumnName + '%'
ORDER BY o.type;
you should be aware that this search finds any procedure containing that text, regardless of whether it is an actual column name and which table the column belongs to.

Find unused stored procedures in code?

Is there an easier way of cleaning up a database that has a ton of stored procedures that I'm sure there are some that aren't used anymore other than one by one search.
I'd like to search my visual studio solution for stored procedures and see which ones from the database aren't used any longer.
You could create a list of the stored procedures in the database. Store them into a file.
SELECT *
FROM sys.procedures;
Then read that file into memory and run a large regex search across the source files for each entry. Once you hit the first match for a given stored procedure, you can move on.
If there are no matches for a given stored procedure, you probably can look more closely at that stored procedure and may even be able to remove it.
I'd be careful removing stored procedures - you also need to check that no other stored procedures depend on your candidate for removal!
I would use the profiler and set up a trace. The big problem is tracking SPs which are only used monthly or annually.
Anything not showing up in the trace can be investigated. I sometimes instrument individual SPs to log their invocations to a table and then review the table for activity. I've even had individual SPs instrumented to send me email when they are called.
It is relatively easy to ensure that an SP is not called from anywhere else in the server by searching the INFORAMTION_SCHEMA.ROUTINES or in source code with GREP. It's a little harder to check in SSIS packages and jobs.
But none of this eliminates the possibility that there might be the occasional SP which someone calls manually each month from SSMS to correct a data anomaly or something.
Hmmm... you could search your solution for code which calls a stored proc, like this (from the DAAB)
using (DbCommand cmd = DB.GetStoredProcCommand("sp_blog_posts_get_by_title"))
{
DB.AddInParameter(cmd, "#title", DbType.String,title);
using (IDataReader rdr = DB.ExecuteReader(cmd))
result.Load(rdr);
}
Search for the relevant part of the first line:
DB.GetStoredProcCommand("
Copy the search results from the "find results" pane, and compare to your stored proc list in the database (which you can generate with a select from the sysObjects table if you're using SQL Server).
If you really want to get fancy, you could write a small app (or use GREP or similar) to perform a regex match against your .cs files to extract a list of stored procedures, sort the list, generate a list of stored procs from your database via select from sysobjects, and do a diff. That might be easier to automate.
UPDATE Alternatively, see this link. The author suggest setting up a trace for a period of a week or so and comparing your list of procs against those found in the trace. Another author suggested: (copied)
-- Unused tables & indexes. Tables have index_id’s of either 0 = Heap table or 1 = Clustered Index
SELECT OBJECTNAME = OBJECT_NAME(I.OBJECT_ID), INDEXNAME = I.NAME, I.INDEX_ID
FROM SYS.INDEXES AS I
INNER JOIN SYS.OBJECTS AS O
ON I.OBJECT_ID = O.OBJECT_ID
WHERE OBJECTPROPERTY(O.OBJECT_ID,'IsUserTable') = 1
AND I.INDEX_ID
NOT IN (SELECT S.INDEX_ID
FROM SYS.DM_DB_INDEX_USAGE_STATS AS S
WHERE S.OBJECT_ID = I.OBJECT_ID
AND I.INDEX_ID = S.INDEX_ID
AND DATABASE_ID = DB_ID(db_name()))
ORDER BY OBJECTNAME, I.INDEX_ID, INDEXNAME ASC
which should find objects that haven't been used since a specified date. Note that I haven't tried either of these approaches, but they seem reasonable.

Resources