Searching for a string 'somewhere' in a database - database

Here's my problem: I'm looking at someone's Postgresql based database application for the first time and trying to find what is causing certain warnings/errors in the system's logfile. I don't know anything about the database schema. I don't know anything about the source code. But I need to track down the problem.
I can easily search for string contents of text file based code like php and perl, using the UNIX command 'grep'; even for compiled binaries I can use use the UNIX commands 'find' and 'strings'.
My problem is that some of the text produced in the logfile comes from the database itself. Checking the error logfile for the database yields nothing useful as there are no problems with the queries used by the application.
What I would like to do is exhaustively search all of the columns and all of the tables of the database for an string. Is this possible, and how?
Thanks in advance for any pointers. The environment used is Postgresql 8.2, but it would be useful to know how to do this in other flavors of relational databases as well.

It may not be optimal, but since I already know how to grep a text file I would just covert the database to a text file and grep that. Converting the database to a text file in this case would mean dumping the data using pg_dump.
The quickest/easiest/most efficient way isn't always elegant...

I am not familiar with Postgresql, but I would think that, like SQL Server, it has meta-data tables/views that describe the schema of the database (for SQL Server 2005+, I'd be referring you to sys.tables and sys.columns). The idea would be to generate a series of ad-hoc queries based on the table schema, each one finding matches in a particular table/field combination and pumping matches into a "log" table.

I've used variants of this in the past.

Related

Most Efficient Way to Migrate Un-Normalized Data in an Access Database to a Normalized Form in a SQL Server Database

I've been doing some research on this topic for a while now and can't seem to find a similar instance to my issue. I will try and explain everything as best I can, as simply as I can.
The problem is in the title; I am trying to migrate data from an Access database to SQL Server. Typically, this isn't really a hard problem as there exists several import/export tools within SQL Server but I am looking for the best solution. That or some advice/tips as I am somewhat new to database migration. I will now begin to explain my situation.
So I am currently working on migrating data that exists in an Access “database” (database in quotes because I don’t think it is actually a database, you’ll know why in a minute) in an un-normalized form. What I mean by un-normalized is that all of the data is in one table. This table has about 150+ columns and the rows number in the thousands. Yikes, I know; this is what I’ve walked into lol. Anyways, sitting down and sorting through everything, I’ve designed relationships for the data that normalize it nicely in its new home, SQL Server. Enter my predicament (or at least part of it). I have the normalized database set up to hold the data but I’m not sure how to import it, massage/cut it up, and place it in the respective tables I’ve set up.
Thus far I’ve done a bunch of research into what can be done and for starters I have found out about the SQL Server Migration Assistant. I’ve begun messing with it and was able to import the data from Access into SQL Server, but not in the way I wanted. All I got was a straight copy & paste of the data into my SQL Server database, exactly as it was in the Access database. I then learned about the typical practice of setting up a global table/staging area for this type of migration, but I am somewhat of a novice when it comes to using TSQL. The heart of my question comes down to this; Is there some feature in SQL Server (either its import/export tool or the SSMA) that will allow me to send the data to the right tables that already exist in my normalized SQL Server database? Or do I import to the staging area and write the script(s) to dissect and extract the data to the respective normalized table? If it is the latter, can someone please show me some tips/examples of what the TSQL would look like to do this sort of thing. Obviously I couldn’t expect exact scripts from anyone without me sharing the data (which I don’t have the liberty of as it is customer data), so some cookie cutter examples will work.
Additionally, future data is going to come into the new database from various sources (like maybe excel for example) so that is something to keep in mind. I would hate to create a new issue where every time someone wants to add data to the database, a new import, sort, and store script has to be written.
Hopefully this hasn’t been too convoluted and someone will be willing (and able) to help me out. I would greatly appreciate any advice/tips. I believe this would help other people besides me because I found a lot of other people searching for similar things. Additionally, it may lead to TSQL experts showing examples of such data migration scripts and/or an explanation of how to use the tools that exist in such a way the others hadn’t used before or have functions/capabilities not adequately explained in the documentation.
Thank you,
L
First this:
Additionally, future data is going to come into the new database from
various sources (like maybe excel for example)...?
That's what SSIS is for. Setting up SSIS is not a trivial task but it's not rocket science either. SQL Server Management Studio has an Import/Export Wizard which is a easy-to-use SSIS package creator. That will get you started. There's many alternatives such as Powershell but SSIS is the quickest and easiest solution IMO. Especially when dealing with data from multiple sources.
SSIS works nicely with Microsoft Products as data sources (such as Excel and Sharepoint).
For some things too, you can create an MS Access Front-end that interfaces with SQL Server via sql server stored procedures. It just depends on the target audience. This is easy to setup. A quick google search will return many simple examples. It's actually how I learned SQL server 20+ years ago.
Is there some feature in SQL Server that will allow me to send the
data to the right tables that already exist in my normalized SQL
Server database?
Yes and don't. For what you're describing it will be frustrating.
Or do I import to the staging area and write the script(s) to dissect
and extract the data to the respective normalized table?
This.
If it is the latter, can someone please show me some tips/examples of
what the TSQL would look like to do this sort of thing.
When dealing with denormalized data a good splitter is important. Here's my two favorites:
DelimitedSplit8K
PatternSplitCM
In SQL Server 2016 you also have split_string which is faster (but has issues).
Another must have is a good NGrams function. The link I posted has the function attached at the bottom of the article. I have some string cleaning functions here.
The links I posted have some good examples.
I agree with all the approaches mentioned: Load the data into one staging table (possibly using SSIS) then shred it with T-SQL (probably wrapped up in stored procedures).
This is a custom piece of work that needs hand built scripts. There's no automated tool for this because both your source and target schemas are custom schemas. So you'd need to define all that mapping and rules somewhow.... and no SSIS does not magically do this!
It sounds like you have a target schema and mappings between source and target schema already worked out
As an example your first step is to load 'lookup' tables with this kind of query:
INSERT INTO TargetLookupTable1 (Field1,Field2,Field3)
SELECT DISTINCT Field1,Field2,Field3
FROM SourceStagingTable
TargetLookupTable1 should already have an identity primary key defined (which is not mentioned in the above query because it is auto generated)
This is where you will find your first problem. You'll almost definitely find your distinct query just gives you a whole lot of duplicated mispelt data rubbish data. So before you even load your lookup table you need to do data cleansing.
I suggest you clean the data in your source system directly but it depends how comfortable you are with that.
Next step is: assuming your data is all clean and you've loaded a dozen lookup tables in this way..
Now you need to load transactions but you don't know the lookup key that you just generated!
The trick is to pre-include an empty column for this in your staging table to record this
Once you've loaded up your lookup table you can write the key back into the staging table. This query matches back on the fields you used to load the lookup, and writes the key back into the staging table
UPDATE TGT
SET MyNewLookupKey = NewLookupTable.MyKey
FROM SourceStagingTable TGT
INNER JOIN
NewLookupTable
ON TGT.Field1 = NewLookupTable.Field1
AND TGT.Field2 = NewLookupTable.Field2
AND TGT.Field3 = NewLookupTable.Field3
Now you have a column called MyNewLookupKey in your staging table which holds the correct lookup key to load into you transaction table
Ongoing uploads of data is a seperate issue but you might want to investigate an MS Access Data Project (although they are apparently being phased out, they are very handy for a front end into SQL Server)
The thing to remember is: if there is anything ambiguous about your data, for example, "these rows say my car is black but these rows say my car is white", then you (a human) needs to come up with a rule for "disambiguating" it. It can't be done automatically.
So there are quite a number of ways to skin this cat. I don't know much about the "Migration Assistant", but I somehow doubt it's going to make your life easier given what you're trying to do.
I'd just dump the whole denormalized mess into a single big staging table then shred it where you need it using SQL. I know you asked for help with the TSQL, but without having some idea of what the denormalized data is and how you want to re-shape it, all I can do really is suggest you read up on SQL in general (select, from, where, group by, etc).
You could also do the work in SSIS, but ultimately the solution you use is largely going to depend on the nature of how you need to normalize the big denormalized data set. IMHO doing this in SQL is usually the easiest way, but then again when you're a hammer, everything looks like a nail.
As far as future proofing the process, how you import the Access data probably will have little bearing on how you'd import Excel data. If you have a whole lot of different data sources which you'll need to incorporate on a recurring basis, SSIS might be a good choice to invest some time and effort into for the long run. No matter what, incorporating data from a distinct data source takes time and effort. You'll have to do some extra work no matter what. I would weight how frequently you think you'll have to integrate a given data source, and how much effort is involved to massage it into the format you want.
I have a completely different opinion. Because I do both database development and Microsoft's Power BI - - on the PBI side we come across a lot of non-normalized data because a lot of the data is coming in from excel.
My guess is that what is now in Access was an import of something originally began in excel.
Excel Power Query and PBI offers transforms to pivot and unpivot layout. I would use these tools to do that task. Then import the results into SQL.

Can I dump an entire Microsoft SQL Server database from Linux?

I've got a linux server that already connects happily to a MS SQL Server and I want to know if there is a way to dump the whole thing into a format I can read. I don't have access to the desktop, but I can connect using PHP and I can issue whatever commands I want. I have admin access to the SQL Server, so no problem there.
My main goal is to understand how the people before me set this thing up. I already know how to get the stored procedures as text (SELECT * FROM sys.procedures), but I was wondering if there is a way to get the whole database. I'm not very familiar with SQL Server so I don't know what important bits I might be missing.
And I don't care if the solution is in PHP or not. That's just the thing I've got working right now. Any SQL-ish command that dumps the entire database would solve my world.
To summarize:
I don't have access to the actual machine/desktop
I have admin access to the DB using PHP's mssql libs
I'm on linux
I want a text file I can look at that tells me everything in the database
My goal is not to answer a specific question - I'm looking to understand what the people before me did when they set up this database. Unknown unknowns, and all that.
Okay, hopefully I've made sense. I'm sorry if I've been a complete idiot. Be gentle. Thanks!
I would backup (http://msdn.microsoft.com/en-us/library/ms186865.aspx) the database to file and then download it, restore it on Windows and then use SQL Server tools like SQL Server Management Studio etc. to look at it.
There is plenty you can do with the metadata, but you could spend a lot of time writing queries instead of using existing off-the-shelf documentation tools.
You can use this script to create insert statements for any given table.
This stackoverflow question will tell you how to generate create table statements.
SELECT NAME FROM sys.tables will give you a list of table names.
You would probably save a LOT of time and pain by just using native SQL SErver Windows tools that work with it.

How to build big and complex database in sql - IN EASY WAY?

I have installed Oracle XE. I build small database every day to practice from command prompt, but now I want to have more. I want to have a bigger database with a lot of different data to practice and make exercises.
So, is possible to get a big data file from somewhere and upload to XE database?
You can't get 'big' data for Oracle Express edition as it is limited to 4GB (10g) or 10GB (11g ).
That said, there are public datasets available. Personally I like the FAA data on registered aircraft owners/operators
As you are practicing with Oracle, perhaps a good solution (which will also generate exactly the data you need) would be to write your own stored procedures to generate your data in a loop (or similar construct).
You could then generate as much as you like whilst also practicing your handling of large datasets and writing of efficient PL/SQL and SQL code.
This way your data will match your current database structure too without having to build a new database matching whichever dataset you download from the web.
IIRC there are sample schemas as HR that can be enabled. See this.

Data migration between different DBMS's

As i couldnt get any satisfying answer to my Question it seems we have to write our own program for that, we are in the design phase and we are thinking which format shall we use to backup the data.
The program will be written in Delphi.
Needed is Exporting/Importing data between Oracle/Informix/Msserver, very important here is the Performance issue, as this program will run on a 1-2 GB Databases. Beside the normal data there are Blobs in the Database which have to be backuped.
We thought of Xml-Data or comma-separated data as both are transparent (which is nice to have), but Blobs must be considered here. Paradox format is not optinal in this case.
Can anybody recommend some performant formats?
Any other Ideas to achieve the same Goal are welcome.
Thanx in Advance.
I use an excellent program called OmegaSync for my backups, but it will only handle Informix via ODBC and not directly. If you find you can use OmegaSync, you'll find its performance to be excellent, because it compares the databases first, and then syncs only the differences. You might want to use this idea if you decide to do the programming yourself if efficiency is your number one goal.
But programming database conversion is very complex as others answers to your question have said. So why not just develop the SQL you need, and do the conversion that way. For example see: Convert Informix Schema to Oracle Schema Or Any Other RDBMS For moving the data, check out sources like: Moving non-informaix data between computers and dbspaces
You can optimize the SQL to what I'm sure will be an adequate speed if you dump and load your data smartly.
DbUnit is a popular tool which can extract and load data in XML format, see
http://www.dbunit.org/faq.html#extract
// partial database export
QueryDataSet partialDataSet = new QueryDataSet(connection);
partialDataSet.addTable("FOO", "SELECT * FROM TABLE WHERE COL='VALUE'");
partialDataSet.addTable("BAR");
FlatXmlDataSet.write(partialDataSet, new FileOutputStream("partial.xml"));
// full database export
IDataSet fullDataSet = connection.createDataSet();
FlatXmlDataSet.write(fullDataSet, new FileOutputStream("full.xml"));
Did you check ODI (Oracle Data Integrator) It has support for lots of source databases. It is able to capture changes from the source databases and integrate them in the target database. It is performant but has a price tag.
Ronald.
The new DBExpress framework give the possibility to exporting/importing data between many databases. you can check this CodeRage session Deep Dive into dbExpress by John Kaster
You should use your own binary format, integrated by (xml for text/streams for Blobs).
If you have to export metadata too and not only data, it could be very complex. There are many subtle (and not so subtle differences) among the databases you're going to use, that such a format should be general enough and the exporting/importing code should be able to translate and map metadata across databases, and because an external application can't write directly to the database internal structures, it would have to generate the db proper DDL to create the data structures.
As long as this is a proprietary format, IMHO its design is the least of your issues, if size and performance are important and the file is read sequentially it would not be difficult to design a binary format.
Anyway import/exports and backups are two different tasks. If you have to backup a database, use its facilities. They usually allow far more control, i.e. point-in-time recovery. If you have to move data across databases that's another issue - I would write just the code to move data, not metadata, pre-creating the required structure in the target database.
You could give Toad (Quest Software) a try.
It supports all your mentioned platforms and can do things like 'Export table data to INSERT statements' on your source platform which can then be run on the target platform.
IIRC there is even some Toad-internal backup-format which might be cross-platform.
Toad Communities:
Toad for ORACLE
Toad for SQL SERVER
Toad for OTHER RDMBS (including Informix)
Some videos about exporting, importing:
YouTube: Toad for Data Analysts v2.7 Export Enhancements
YouTube: Toad for Data Analysts v2.7 Import Enhancements

Transferring data between different DBMS's

I would like to transfer the whole Database i have in Informix to Oracle. We have an an application which works on both Databases, one of our customers is moving from Informix to Oracle, and needs to transfer the whole Database to Oracle (the structure is the same).
We need often to transfer data between oracle/Mssql/Informix sometimes only one table and not the whole Database.
Does anybody know about any good program which does this kind of job?
The Pentaho Data Integration ETL tools are available as open source (also known under the former name "Kettle") for cross-database migration and many other use cases.
From their data sheet:
Common Use Cases
Data warehouse population with built-in support for slowly changing
dimensions, junk dimensions
Export of database(s) to text-file(s) or other databases
Import of data into databases, ranging from text-files to excel
sheets
Data migration between database applications
...
A list of input / output data formats can be found in the accepted answer of this question: Does anybody know the list of Pentaho Data Integration (Kettle) connectors list?
It supports all databases with a JDBC driver, which means most of them.
Check this question of mine, it includes some very good ideas: Searching for (freeware) database migration tool
you could give the Oracle Migration Workbench a try. See http://download.oracle.com/docs/html/B15858_01/toc.htm If you want to read Informix data into Oracle on a regular basis, using the Heterogeneous Services might be a better option. Check for hs4odbc or dg4odbc, depending on the Oracle release you have.
I hope this helps,
Ronald.
I have done this in the past and it is not a trivial task. We ended up writing out each table out to a pipe delimited flat file and reloading each table into Oracle with Oracle SQL Loader. There was a ton of Perl scripts to scrub the source data and shell scripts to automate the process as much as possible and run things in parallel as well.
Gotchas that can come up:
1. Pick a delimiter that is as unique as possible.
2. Try to find data types that match as close as possible to the Informix ones as possible. ie date vs. timestamp
3. Try to get the data as clean as possible prior to dumping out the flat files.
4. HS will most likely be too slow..
This was done years ago. You may want to investigate Golden Gate (now owned by Oracle) software which may help with the process(GG did not exist when I did it)
Another idea is use an ETL tool to read Informix and dump the data into Oracle (Informatica comes to mind)
Good luck :)
sqlldr - Oracle's import utility
Here's what I did to transfer 50TB of data from MySQL to ORacle. Generated csv files from MySql and used sqlldr utility in oracle to export all the data from the files to oracle db. It is the fastest way to import data. I researched on this for a few weeks and done lot of benchmark test cases and sqlldr is hands down best and fastest way to import into oracle.

Resources