Parse SQL scripts to find table dependencies and outputs - sql-server

I currently have a large set of sql scripts transforming data from one table to another, often in steps, like for example
select input3.id as cid
, input4.name as cname
into #temp2
from input3
inner join input4
on input3.match = input4.match
where input3.regdate > '2019-01-01';
truncate table output1;
insert into output1 (customerid, customername)
select cid, cname from #temp2;
I would like to "parse" these scripts into their basic inputs and outputs
in: input3, input4
out: output1
(not necessarily this format, just this info)
To have the temporary tables falsely flagged would not be a problem:
in: input3, input4, #temp2
out: #temp2, output1
It is OK to take a little bit of time, but the more automatic, the better.
How would one do this?
Things I tried include
regexes (straight forward but will miss edge cases, mainly falsely flagging tables in comments)
Use an online parser to list the DB objects, postprocessing by hand
Look into solving it programmatically, but for example writing a C# program for this will cost too much time

I usually wrap the scripts' content into stored procedures and deploy them into the same database where the tables are located. If you are sufficiently acquainted with (power)shell scripting and regexps, you can even write the code which will do it for you.
From this point on, you have some alternatives:
If you need a complete usage / reference report, or it's a one-off task, you can utilise the sys.sql_expression_dependencies or other similar system views;
Create a SSDT database project from that database. Among many other things that make database development easier and more consistent, SSDT has the "Find all references" functionality (Shift+F12 hotkey) which displays all references of a particular object (or column) across the code.
AFAIK neither of them sees through dynamic SQL, so if you have lots of it, you'll have to look elsewhere.

Related

How to systematically manage a Big List of Queries and Tables in SQL Server?

Suppose someone has to work on a lot of different SQL Server Databases which have got a lot of Tables and Queries / Views inside them.
After a period of time, it becomes very difficult to remember exactly what kind of columns are present within a given Table and View.
Please suggest some method by which one can keep a systematic list of all the Tables and Views that are present within a SQL Server Database, along with the columns that are present within them.
Are there any Add-on products or services etc. available that helps in making this type of work systematic?
Currently I add comments to each queries inside SQL Server to remind me of what this query is doing, but this method is not great. I am looking for some better and more efficient methods.
Please share any ideas that you might have in this direction.
Thanks a lot
You may find the following useful for each database.
select s.name, s.type, c.name , s.refdate
from syscolumns c
inner join sysobjects s on s.id = c.id
where s.xtype in('U','V')
order by s.refdate --use refdate for manual quick looks
-- use s.name for file output and long term analysis
I output this to text files with the exact same format and check them into source control for each database. I even make comments about fields as things change. This is not part of the formal process, it is just sanity big picture version tracking independent of the formal deployments.

Safe/reliable/standard process for making major changes to a database with existing data?

I would like to take one table that is heavy with flags and fields, and break it into smaller tables. The parent table to be revised/broken down already contains live data that must be handled with care.
Here is my plan of attack, that I'm hoping to execute this weekend while no one is using the systsem.
Create the new tables that we will need
Rename the existing parent table, ParentTable, to ParentTableOLD
Create a new table called ParentTable with the unneeded fields gone, and new fields added
Run a procedure to copy the entries in ParentTableOLD to the new tables, mapping old data to new tables/fields where applicable
Delete the ParentTableOLD table
The above seems pretty reasonable and simple to me, I'm fairly certain it will work. I'm interested in other techniques to achieve this (the above is the only thing I can think of), as well as any kind of tools to help stay organized. Right now I'm running on pen and paper.
Reason I ask is that several times now, I've been re-inventing the wheel just because I didn't know any better, and someone more experienced came along and saw what I was doing and said, "oh there's a built-in way to help do this," or, "there's a simpler way to do this." I did coding for months and months with Visual Studio before someone stopped by and said "you know about breakpoints to step through the code, yeah?" --- life changing, hah.
I have SQL Server 2008 R2 with SSMS.
A good trick to assist you in creating your '_old' tables is:
SELECT *
INTO mytable_old
FROM mytable
SELECT INTO will copy all of the data and create your table for you in one step.
This said - I would actually retain the current table names and instead copy everything into another schema. This will make adapting queries and reports to run over the old schema (where needed) a lot easier then having to add '_old' to all the names (since instead you can just find/replace the schema names).
If at all possible, I'd be doing this is some sort of test environment first and foremost. If you have external applications that rely on the database, then make sure they all run against your modified structure without any hiccups.
Also do a search on your database objects that might reference the table you are going to rename. For example;
SELECT Name
FROM sys.procedures
WHERE OBJECT_DEFINITION(OBJECT_ID) LIKE '%MyTable%'
Try and ensure some sort of functional equivalence from queries between your new and old schemas. Have some query/queries that can be run against your renamed table and then have your reworked schema referencing your new table structure. This way you can make sure the data returned is the same for both structures. If possible, prepare these all ahead of time so that it is simply a series of checks you can do once you've done your modifications and if there are differences, this can help you decide whether you proceed with the change or back it out.
Lastly, have a plan for working out how you could revert to the old schema if something catastrophic were to occur. If you'd been working with your new table structure for a period of time, and then discovered a major issue, could you revert back to the old table and successfully get the data out of your modified table structure back to the old table? Basically, just be a follow the boy scout rules and be prepared.
This isn't really an answer for your overall problem, but a couple tools that you might find useful for your Step 4 is RedGate's SQL Compare and Data Compare. SQL Compare will perform schema migrations, and Data Compare will help migrate data. You can move data to new columns and new tables, populate default values, sync from dev to production, among other things.
You can make your changes in a dev environment with production data, and when you're satisfied with the process, do the actual migration in production.
Make a backup of database (for reference : http://msdn.microsoft.com/en-us/library/ms187510.aspx) and then you can do the required steps. If everything goes fine, then go ahead else restore the old database ( for reference : http://msdn.microsoft.com/en-us/library/ms177429.aspx)
You can even automate this process of making a backup for say, every week.

How do I compare SQL Server databases and update another?

I am currently dealing with a scenario of whereby I have 2 SQL databases and I need to compare the changes in data in each of the tables in the 2 databases. The databases have the same exact tables between them however may have differing data in each of them. The first hurdle I have is how do I compare the data sets and the second challenge I have is once I have identified the differences how do I say for example update data missing in database 2 that may have existed in 1.
Why not use SQL Data Compare instead of re-inventing the wheel? It does exactly what you're asking - compares two databases and writes scripts that will sync in either direction. I work for a vendor that competes with some of their tools and it's still the one I would recommend.
http://www.red-gate.com/products/sql-development/sql-data-compare/
One powerful command for comparing data is EXCEPT. With this, you can compare two tables with same structure simply by doing the following:
SELECT * FROM Database1.dbo.Table
EXCEPT
SELECT * FROM Database2.dbo.Table
This will give you all the rows that exist in Database1 but not in Database2, including rows that exist in both but are different (because it compares every column). Then you can reverse the order of the queries to check the other direction.
Once you have identified the differences, you can use INSERT or UPDATE to transfer the changes from one side to the other. For example, assuming you have a primary key field PK, and new rows only come into Database2, you might do something like:
INSERT INTO Database1.dbo.Table
SELECT T2.*
FROM Database2.dbo.Table T2
LEFT JOIN Database1.dbo.Table T1 on T2.PK = T1.PK
WHERE T1.PK IS NULL -- insert rows that didn't match, i.e., are new
The actual methods used depend on how the two tables are related together, how you can identify matching rows, and what the sources of changes might be.
You also can look at the Data Compare feature in Visual Studio 2010 (Premium and higher)? I use it to make sure configuration tables in all my environments ( i.e. development, test and production ) are in sync. It made my life enormously easier.
You can select tables you want to compare, you can choose columns to compare. What I haven't learned to do though is how to save my selections for the future use.
You can do this with SQL Compare which is a great too for development but if you want to do this on a scheduled basis a better solution might be Simego's Data Sync Studio. I know it can do about 100m (30 cols wide) row compare on 16GB on an i3 iMac (bootcamp). In reality it is comfortable with 1m -> 20m rows on each side. It uses a column storage engine.
In this scenario it would only take a couple of minutes to download, install and test the scenario.
I hope this helps as I always look for the question mark to work out what someone is asking.

What's the best way to convert one Oracle table (data) to fill a slightly different Oracle table?

I have two Oracle tables, an old one and a new one.
The old one was poorly designed (more so than mine, mind you) but there is a lot of current data that needs to be migrated into the new table that I created.
The new table has new columns, different columns.
I thought of just writing a PHP script or something with a whole bunch of string replacement... clearly that's a stupid way to do it though.
I would really like to be able to clean up the data a bit along the way as well. Some it was stored with markup in it (ex: "First Name"), lots of blank space, etc, so I would really like to fix all that before putting it into the new table.
Does anyone have any experience doing something like this? What should I do?
Thanks :)
I do this quite a bit - you can migrate with simple select statememt:
create table newtable as select
field1,
trim(oldfield2) as field3,
cast(field3 as number(6)) as field4,
(select pk from lookuptable where value = field5) as field5,
etc,
from
oldtable
There's really very little you could do with an intermediate language like php, etc that you can't do in native SQL when it comes to cleaning and transforming data.
For more complex cleanup, you can always create a sql function that does the heavy lifting, but I have cleaned up some pretty horrible data without resorting to that. Don't forget in oracle you have decode, case statements, etc.
I'd checkout an ETL tool like Pentaho Kettle. You'll be able to query the data from the old table, transform and clean it up, and re-insert it into the new table, all with a nice WYSIWYG tool.
Here's a previous question i answered regarding data migration and manipulation with Kettle.
Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?
If the data volumes aren't massive and if you are only going to do this once, then it will be hard to beat a roll-it-yourself program. Especially if you have some custom logic you need implemented.
The time taken to download, learn & use a tool (such as pentaho etc.) will probably not worth your while.
Coding a select *, updating columns in memory & doing an insert into will be quickly done in PHP or any other programming language.
That being said, if you find yourself doing this often, then an ETL tool might be worth learning.
I'm working on a similar project myself - migrating data from one model containing a couple of dozen tables to a somewhat different model of similar number of tables.
I've taken the approach of creating a MERGE statement for each target table. The source query gets all the data it needs, formats it as required, then the merge works out if the row already exists and updates/inserts as required. This way, I can run the statement multiple times as I develop the solution.
Depends on how complex the conversion process is. If it is easy enough to express in a single SQL statement, you're all set; just create the SELECT statement and then do the CREATE TABLE / INSERT statement. However, if you need to perform some complex transformation or (shudder) split or merge any of the rows to convert them properly, you should use a pipelined table function. It doesn't sound like that is the case, though; try to stick to the single statement as the other Chris suggested above. You definitely do not want to pull the data out of the database to do the transform as the transfer in and out of Oracle will always be slower than keeping it all in the database.
A couple more tips:
If the table already exists and you are doing an INSERT...SELECT statement, use the /*+ APPEND */ hint on the insert so that you are doing a bulk operation. Note that CREATE TABLE does this by default (as long as it's possible; you cannot perform bulk ops under certain conditions, e.g. if the new table is an index-organized table, has triggers, etc.
If you are on 10.2 or later, you should also consider using the LOG ERRORS INTO clause to log rejected records to an error table. That way, you won't lose the whole operation if one record has an error you didn't expect.

SQL equivalent of "using" for schemas?

I'm working with a SQL Server DB that's got tables spread across multiple schemas (not my idea), so queries end up looking like this:
select col1, col2
from some_ridiculously_long_schema_name.table1 t1
inner join
another_really_long_schema_location.table2 t2
on...
... you get the idea.
This is a small inconvenience when I put queries into stored procs, etc., but when I'm doing adhoc queries, this gets to be a real pain.
Is there some way I could "include" all the schemas I'm interesed in, and have them automatically addressable? (LINQPad does this).
I'd love to be able to be able to indicate something like this:
using some_ridiculously_long_schema_name, another_really_long_schema_location
... and then query away, with those schemas included in my address space.
If nothing like this exists, I'll look into synonymns, but I'd prefer to do this without having to add artifacts into the DB.
Red-Gate sells an SQL tool that adds intellisense to server management studio. Never tried it but it might help cut down on the keystrokes: http://www.red-gate.com/products/SQL_Prompt/index.htm
I know how you feel, if you need to keep the schemas (for example if you have the same table names in each) and you are consistently writing queries that join across the schemas the best suggestion I can offer is to shorten your schema names.
Low tech and not what you wanted to hear I am sure.
Synonyms as suggested above only work at an object level (you cant have a synonym for a whole schema as far as I know) so you would have to have a synonym for every table, view, stored proc, function etc that you wanted to use from outside your default schema.
no it doesn't. synonims are the only way.
That will not work because if you have Table1 in both schemas then how would you know what schema you want?

Resources