I'm a very young software architect. Now I'm working in a very large and I have to lead a group of developers to rewrite all the mortgage system of the bank.
I'm looking at database tables and I realize that there is no any data model, neither documentation. The worst part is that there are about 1000 tables in dev environment, and like 600 in production. I trust more the production environment, but anyway, what can I do? I mean, I can suicide me or something, but is there any good reverse engineering tool, so at least I could get the schema definition with the relations between tables and comments extracted from the fields? Can you advice me something?
Thanks in advance.
If you are lucky and the database actually uses primary and foreign keys, you can get some excellent documentation with SchemaSpy, a nice command line tool written in Java.
Update: I've just remembered that Oracle SQL Developer has a similar tool (create a connection, right click on its icon and choose "Generate DB Doc") though it doesn't draw graphs.
Try connecting with a tool called TOAD - it has been some years since i used it but IIRC you could select the appropriate schema that you want to inspect and it will give you a tree view with all the tables and views and you can expand the table nodes to see the column details. No doubt the tool has moved on considerably since i last used it.
You can extract the comments like so
select * from dba_tab_comments
where owner not in ('SYS', 'SYSTEM')
and
select * from dba_col_comments
where owner not in ('SYS', 'SYSTEM')
As for reverse engeneering: if you're going to draw an ERD with 600+ tables, this is (probably) going to be too large anyway. I'd first try to find "clusters" of related tables and then use a specialized tool to draw these clusters.
Obviously, you want to make sure that the entire schema has foreign keys enforced. You might want to look at
select * from dba_constraints
where constraint_type = 'R' and
owner not in ('SYS', 'SYSTEM')
to see if all foreign keys.
Unfortunately this is oracle. For mysql there is a pretty nice tool called Mysql Workbench.
Related
I've been doing some research on this topic for a while now and can't seem to find a similar instance to my issue. I will try and explain everything as best I can, as simply as I can.
The problem is in the title; I am trying to migrate data from an Access database to SQL Server. Typically, this isn't really a hard problem as there exists several import/export tools within SQL Server but I am looking for the best solution. That or some advice/tips as I am somewhat new to database migration. I will now begin to explain my situation.
So I am currently working on migrating data that exists in an Access “database” (database in quotes because I don’t think it is actually a database, you’ll know why in a minute) in an un-normalized form. What I mean by un-normalized is that all of the data is in one table. This table has about 150+ columns and the rows number in the thousands. Yikes, I know; this is what I’ve walked into lol. Anyways, sitting down and sorting through everything, I’ve designed relationships for the data that normalize it nicely in its new home, SQL Server. Enter my predicament (or at least part of it). I have the normalized database set up to hold the data but I’m not sure how to import it, massage/cut it up, and place it in the respective tables I’ve set up.
Thus far I’ve done a bunch of research into what can be done and for starters I have found out about the SQL Server Migration Assistant. I’ve begun messing with it and was able to import the data from Access into SQL Server, but not in the way I wanted. All I got was a straight copy & paste of the data into my SQL Server database, exactly as it was in the Access database. I then learned about the typical practice of setting up a global table/staging area for this type of migration, but I am somewhat of a novice when it comes to using TSQL. The heart of my question comes down to this; Is there some feature in SQL Server (either its import/export tool or the SSMA) that will allow me to send the data to the right tables that already exist in my normalized SQL Server database? Or do I import to the staging area and write the script(s) to dissect and extract the data to the respective normalized table? If it is the latter, can someone please show me some tips/examples of what the TSQL would look like to do this sort of thing. Obviously I couldn’t expect exact scripts from anyone without me sharing the data (which I don’t have the liberty of as it is customer data), so some cookie cutter examples will work.
Additionally, future data is going to come into the new database from various sources (like maybe excel for example) so that is something to keep in mind. I would hate to create a new issue where every time someone wants to add data to the database, a new import, sort, and store script has to be written.
Hopefully this hasn’t been too convoluted and someone will be willing (and able) to help me out. I would greatly appreciate any advice/tips. I believe this would help other people besides me because I found a lot of other people searching for similar things. Additionally, it may lead to TSQL experts showing examples of such data migration scripts and/or an explanation of how to use the tools that exist in such a way the others hadn’t used before or have functions/capabilities not adequately explained in the documentation.
Thank you,
L
First this:
Additionally, future data is going to come into the new database from
various sources (like maybe excel for example)...?
That's what SSIS is for. Setting up SSIS is not a trivial task but it's not rocket science either. SQL Server Management Studio has an Import/Export Wizard which is a easy-to-use SSIS package creator. That will get you started. There's many alternatives such as Powershell but SSIS is the quickest and easiest solution IMO. Especially when dealing with data from multiple sources.
SSIS works nicely with Microsoft Products as data sources (such as Excel and Sharepoint).
For some things too, you can create an MS Access Front-end that interfaces with SQL Server via sql server stored procedures. It just depends on the target audience. This is easy to setup. A quick google search will return many simple examples. It's actually how I learned SQL server 20+ years ago.
Is there some feature in SQL Server that will allow me to send the
data to the right tables that already exist in my normalized SQL
Server database?
Yes and don't. For what you're describing it will be frustrating.
Or do I import to the staging area and write the script(s) to dissect
and extract the data to the respective normalized table?
This.
If it is the latter, can someone please show me some tips/examples of
what the TSQL would look like to do this sort of thing.
When dealing with denormalized data a good splitter is important. Here's my two favorites:
DelimitedSplit8K
PatternSplitCM
In SQL Server 2016 you also have split_string which is faster (but has issues).
Another must have is a good NGrams function. The link I posted has the function attached at the bottom of the article. I have some string cleaning functions here.
The links I posted have some good examples.
I agree with all the approaches mentioned: Load the data into one staging table (possibly using SSIS) then shred it with T-SQL (probably wrapped up in stored procedures).
This is a custom piece of work that needs hand built scripts. There's no automated tool for this because both your source and target schemas are custom schemas. So you'd need to define all that mapping and rules somewhow.... and no SSIS does not magically do this!
It sounds like you have a target schema and mappings between source and target schema already worked out
As an example your first step is to load 'lookup' tables with this kind of query:
INSERT INTO TargetLookupTable1 (Field1,Field2,Field3)
SELECT DISTINCT Field1,Field2,Field3
FROM SourceStagingTable
TargetLookupTable1 should already have an identity primary key defined (which is not mentioned in the above query because it is auto generated)
This is where you will find your first problem. You'll almost definitely find your distinct query just gives you a whole lot of duplicated mispelt data rubbish data. So before you even load your lookup table you need to do data cleansing.
I suggest you clean the data in your source system directly but it depends how comfortable you are with that.
Next step is: assuming your data is all clean and you've loaded a dozen lookup tables in this way..
Now you need to load transactions but you don't know the lookup key that you just generated!
The trick is to pre-include an empty column for this in your staging table to record this
Once you've loaded up your lookup table you can write the key back into the staging table. This query matches back on the fields you used to load the lookup, and writes the key back into the staging table
UPDATE TGT
SET MyNewLookupKey = NewLookupTable.MyKey
FROM SourceStagingTable TGT
INNER JOIN
NewLookupTable
ON TGT.Field1 = NewLookupTable.Field1
AND TGT.Field2 = NewLookupTable.Field2
AND TGT.Field3 = NewLookupTable.Field3
Now you have a column called MyNewLookupKey in your staging table which holds the correct lookup key to load into you transaction table
Ongoing uploads of data is a seperate issue but you might want to investigate an MS Access Data Project (although they are apparently being phased out, they are very handy for a front end into SQL Server)
The thing to remember is: if there is anything ambiguous about your data, for example, "these rows say my car is black but these rows say my car is white", then you (a human) needs to come up with a rule for "disambiguating" it. It can't be done automatically.
So there are quite a number of ways to skin this cat. I don't know much about the "Migration Assistant", but I somehow doubt it's going to make your life easier given what you're trying to do.
I'd just dump the whole denormalized mess into a single big staging table then shred it where you need it using SQL. I know you asked for help with the TSQL, but without having some idea of what the denormalized data is and how you want to re-shape it, all I can do really is suggest you read up on SQL in general (select, from, where, group by, etc).
You could also do the work in SSIS, but ultimately the solution you use is largely going to depend on the nature of how you need to normalize the big denormalized data set. IMHO doing this in SQL is usually the easiest way, but then again when you're a hammer, everything looks like a nail.
As far as future proofing the process, how you import the Access data probably will have little bearing on how you'd import Excel data. If you have a whole lot of different data sources which you'll need to incorporate on a recurring basis, SSIS might be a good choice to invest some time and effort into for the long run. No matter what, incorporating data from a distinct data source takes time and effort. You'll have to do some extra work no matter what. I would weight how frequently you think you'll have to integrate a given data source, and how much effort is involved to massage it into the format you want.
I have a completely different opinion. Because I do both database development and Microsoft's Power BI - - on the PBI side we come across a lot of non-normalized data because a lot of the data is coming in from excel.
My guess is that what is now in Access was an import of something originally began in excel.
Excel Power Query and PBI offers transforms to pivot and unpivot layout. I would use these tools to do that task. Then import the results into SQL.
I want to know how can I know the relationship in the database which is not determined by the primary key in the design
For example I have three tables in the database following like this :
Table 1 : fields
Table 2 : area
Table 3 : location
and all of the tables have data but who created database did not explain the primary keys and the foreign keys in it, so how can i know the relationship between this tables?
Note : I am using SQL Server 2008
If there's no documentation at all and no defined foreign keys, your options are:
Look at the application source code to see what it's doing. This may or may not be available to you. If it isn't, you're in a fair bit of trouble.
Contact whomever wrote the application originally and ask them for help or documentation. This may or may not be possible.
Guess. I'm certain you'll have to do a lot of this no matter what.
Run the application while SQL Profiler is Trace Capturing the SQL queries sent to the DB, or using Extended Events in SSMS. I don't recommend running Profiler on a production DB due to the performance impact. I've never used Extended Events, but I know that they're replacing Profiler's capture abilities in forthcoming editions of SQL Server. Neither of these tools is particularly simple (Profiler isn't; Extended Events don't look any better from the doc). You're going to need to read a fair bit of documentation.
I have some dbf files of foxpro database. However, I have no idea about what are relationship between tables regarding foreign key, what tables are inter related and so on. Is there any tool which can help me to learn relationships easily. I mean which can draw relationship instead of me figuring out by hit and try.?
I want to export this database to Microsoft SQL Server. So I want to learn whole database schema to learn tables logic.
Thanks
Is this just a set of VFP free tables or is there a VFP database, as well? If there's a database, you'll have files with DBC, DCT and DCX extensions. If you have those, open the database and take a look:
OPEN DATABASE whatever
MODIFY DATABASE
If there are relationships defined between the tables, you'll see them there.
VFP ships with a program called GENDBC that will generate code to recreate a database. Since it's all SQL code, that might help you to see what's in there, as well.
The Stonefield Database Toolkit is designed to work with Visual FoxPro and has a lot of documenting ability. Not free though.
We are putting together a set of standards for our database. I am worried that down the road people will forget the standards or new developers will come online and not bother to use them.
I am wondering if there is a tool to audit standards and provide a report based on the standards. I would like it to include things like naming conventions for columns to not having GUIDS as the primary key.
Apex SQL used to have a tool like this called Enforce. But they discontinued it. Is there any such tool still on the market?
You can do a lot of things like this with Policy-Based Management. For example, here are a few tips I wrote for mssqltips that describe how to do a couple of things:
Enforce database naming conventions
Identify SQL Servers with inefficient power plans
Find unused indexes
Find all columns of a specific data type
Some various tips by other authors as well:
SQL Server Policy Based Management Tips
The sky's the limit, really. Anything you can run a SQL query to get a scalar result (and several other things as well), you can check with PBM.
For object-level stuff, you can get a good part of the way there using simple DDL triggers. For these you can simply hook onto DDL events (e.g. CREATE TABLE) and roll back if your naming conventions or other criteria are not being upheld. They work very similar to a DML trigger for modifying data in a table.
Just keep in mind that you can't always enforce everything, for example you can't rollback things that aren't "transactionable" (such as CREATE DATABASE) using either PBM or DDL triggers. And be careful where you put your "on change prevent" type of enforcement - for example rolling back a CREATE INDEX that took 12 hours isn't going to go over very well if it was rolled back only because it wasn't named correctly.
SSW of Australia also has a really nice tool for this called SQL Auditor.
They check SQL Server databases against a whole slew of "best practice" rules and give you a report on how you do according to their ruleset.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've recently inherited the job of maintaining a database that wasn't designed very well and the designers aren't available to ask any questions. And I have a couple more coming my way in the near future.
It's been tough trying to figure out the relationships between the tables without any kind of visual aid or database diagram.
I was wondering what tools are recommended for this. I know about Visio, but I was hoping there were some good open source/freeware applications out there. I don't need it to change the database at all. Just read it and create some kind of visual aid to help me understand how things are laid out and try to figure out what the designer was thinking about how the data should relate.
Additional answer data: SchemaSpy was the kind of thing I was looking for, but having not done a lot with the command line in ages, I opted to use SchemaSpyGUI. There was also some configuration to get used to since I don't work with Java much, but the end result was what I was looking for (on open-source replacement for Visio's ER diagrams).
Try SchemaSpy. I ran it against a rather complex database and I was quite pleased by the result, with advice on optimization.
Try DBVis - download at http://www.minq.se/products/dbvis/ - there is a pro version (not needed) and a open version that should suffice.
All you have to do is to get the right JDBC - database driver, the tool shows tables and references orthogonal, hirarchical, in a circle ;-) etc. just by pressing one single button. Enjoy!
What DBMS (Database Management System) are you using? Many modern DBMS's like SQL server and Access can create an E-R diagram for you.
Microsoft Visio is an excellent tool and can reverse engineer SQL from any datasource.
DDT (Database Design Tool) can reverse engineer from raw SQL on windows and is very lightweight (very small free download).
MySQL Workbench is one of the more popular MySQL tools and has a freely downloadable version.
SQLFairy can do the same for MySQL on Linux.
dbdesc is not free, but I've heard very good things about it. It works with several of the major databases out there.
I have been lucky in that I haven't had to decipher other people's database schemas yet. I have use a set of templates that come with CodeSmith.
Firstly, may I say that I feel your pain!
Here are a couple of my tips:
In general, a tool will only be helpful if the designers have correctly defined all the primary and foreign keys, so be aware that a tool might not pick up all the important relationships.
The most useful thing is to see what queries are being performed by the client code. This will tell you not only what relationships exist, but which tables and relationships are the most frequently used - that's where you'll want to concentrate your effort.
There is a bit of open-source software out there but Visio Professional's tool for reverse-engineering database schemas is quite good because it de-couples the process of reverse-engineering and diagramming. I use this a lot because it tends to be readily available at most sites.
One nice feature of visio is that you can reverse engineer and then construct your own diagrams from the reverse-engineered schema. Doing this is a very good way to explore the schema and understand it as you are doing this work as a part of interactively building a reference document for the schema. I've used this technique to reverse engineer everything from Activity Based Costing Systems to Insurance Underwriting Systems, typically without much help from the vendor. Tinkering about with Visio diagrams is quite relaxing.
Between this and a little hypothesis testing about FK relationships (If the FK is not physically present on the table) you can make sense of quite complex schemas. I've found this diagramming approach makes Visio a head-and-shoulders leader because you can easily interact with the reverse-engineered model in a fairly convenient way. You can fill in missing foreign keys, build subject area diagrams and add annotations on the diagrams. The interactivity of this process makes it a good learning tool.
This is a somewhat subjective view but the interactivity works very well as a learning proces for me and it's by far my preferred approach. Most sites won't begrudge you the £300 or so for a license - if they don't already have it available. The only site I ever worked where they had to get it in was because they had Visio Standard instead of Pro. I asked nicely and the PHB signed it off.
I use mysql workbench (http://www.mysql.com/products/workbench/) for mysql databases. You can attach the workbench to your database and it will draw the ER digram for you.
Using pgsql/win32 I found the easiest solution was to write a perl script that made use of Graph::Easy from CPAN. Query the database for foreign key relationships, make a directed graph with tables as nodes and FK relationships as links. If this is your setup, I can post the code.
I like to try and see if the applications that use the database have ways of logging the SQL they use (or the DB backend itself, but that tends to be less tractable). Getting a feel for what requests performed on the database helps you concentrate on the important tables.
As with most things, the 80/20 rule applies here: 20% of the tables will do 80% of the interesting stuff. Once you've figured them out, a diagram is rarely necessary.
Look at the primary key foreign key relationshsips that have been set up as a starting place.
Since a database without existing diagrams, may not have relationships set up formally, I look at the table structures and names and make my best guesses as to what might be related to what, then dig into the structures to see if there are obvious (but undefined) foreign keys. I look at the stored procs to get an idea as to how the tables are joined and what fields are being queried on.
While automated tools to figure out the database can be spiffy, I find that when I really dig into the details of the database myself, I end up with a much better understanding than I can get from any picture created automatically.
I have some pretty good experience with Aqua Data Studio for reverse engineering a DB schema. It is very feature rich and supports even more exotic databases like Informix or Sybase.
This helped me with generating the ER diagrams on MS SQL Server 2012:
MS SQL Server management Studio > File menu > "Connect Object Explorer"
Choose your Database node and expand it. under this node you'll find a sub-node called "Database Diagrams"
Right click on "Database Diagrams" > "New Database Diagram" > Add tables that you wish to see their columns, relationships, ...
Use Visio. If using Vision 2010, you will need to use the Generic OLEDB Provider for SQL Server to ensure that there will be no problems with connecting to the Visio Driver.