Programmatically verify database structure in C++ - database

How would I verify a database is structured how my C++ program expects? Our source control was very weak in the past so we have many production installs out there using databases which are missing columns and tables which are now required in the current version of the C++ program. I'd like to have my app check to make sure the database is structured the way it expects at startup time. Ideally this would work for SQL, Oracle, Access, MySql DBs.

The difficulty seems to be in the cross-DBMS. ODBC drivers provide most of the functionality you need across all databases.
In this situation I have used ODBC SQLTables and SQLDescribeColumn to extract a definition of all tables, columns and indexes on the database and then compared that to the output of the process run against a known good database.
This is easy enough if you just want to validate the structure, the code to repair such a database by adding columns and indexes followed logically from that but got a little harder.

Assuming you can query the database using SQL, then you can use the DESCRIBE sql statement to request a description of the table you are looking at, e.g.
DESCRIBE table1;
The use your code to check through the description to analyse whether it is correct..
This will give you are list of Fields, their Type, and other information e.g.
Field |Type |NULL |Key |Default| Extra
Col 1 |int(11) |NO |PRI |NULL | auto_increment
Co1 2 |time |No | |NULL |
You can then go through this table.

"I'd like to have my app check to make sure the database is structured the way it expects at startup time."
And what the ... do you think this is going to solve ?
If the database is not structured the way your program expects, then I'd say your program is almost absolutely certain to fail (and quite early on at that, probably).
Supposing you could do the check (and I'm quite certain you could go a hell of a long way to achieve that, even far beyond what has been suggested here), what else do you think there is left for you to do but to abort your program saying "cannot run, database not as expected" ?
The result is almost guaranteed to be the same in both situations : your program won't run. If you are experiencing problems with "databases not structured as expected", then you need to look at (and fix the faults in) the overall process. Software does not live "in its own world", and neither do databases.

Related

SQL Server Normalisation/Best Practices: Single Data Table

I have inherited the maintenance of a database from a former employee in another department and I believe their database development skills are not really up to snuff.
I have been asked to support or redevelop it.
It appears the database of the data for each record is in one single table, Yes I know and has hundreds of thousands of rows with empty fields.
TableData:
> RowID
> FieldID
> DateData
> NumberData
> TextData
> YesNoData
Only one field (dependent on the datatype required) appears to be populated in this instance for each row - the rest are empty.
There are two other tables which identify details of the Record (Created by etc) and the Field (Updated On, Field datatype)
Looking through the Access front-end code it appears that data for each field and record and field is stored by searching on record and field and then returning the appropriate field with the data.
My question: For what purpose does this achieve, or is this type of development considered the work of an inexperienced database developer?
My best guess is that a table like this is used to store arbitrary data (inferred from the other supporting tables) that won't require schema changes to store information that is "unplanned" or not yet implemented in the business logic of the application.
The questions I would start asking (yourself, any programmers, DBA's, project managers, etc.):
Were the requirements so abstract at the time that it was impossible to create a formal schema with data relationships? (Bad, bad, BAD)
Was the database designer lazy or inexperienced?
Was the programmer lazy or inexperienced? (Better yet, was the programmer the DBA?)
Is the reliability/availability of the data so sensitive that making formal schema changes is hard to do on a regular basis?
Has the project gone through plenty of people before you that simply inherited the problems, and this is a hack solution? (While maybe the original programmer knew where it was intended to go eventually...)
I think what you're really trying to get at here is "does this work, or should I change it?". I'd be shocked if the any read/search queries are optimized at all, as there couldn't be any indexes for such arbitrary data storage. If the application is simply logging information, it probably isn't as big of a deal, as the originator probably just didn't know yet how the data would be used later on, and writing a one-time applet to loop through and create formal objects out of the data would be better than trying to assume everything at the beginning.
Getting a little more targeted, are you running into any bottlenecks in your process because of this particular table, or are you concerned just out of surprise? If the former, I'd figure out how to change it right away. If the latter, I'd take my time figuring out the long-term requirements of the application first.

When is it OK to blur the abstraction between data and logic?

I mean referring to specific database rows by their ID, from code, or specifying a class name in the database. Example:
You have a database table called SocialNetwork. It's a lookup table. The application doesn't write or or delete from it. It's mostly there for database integrity; let's say the whole shebang looks like this:
SocialNetwork table:
Id | Description
-----------------------------
1 | Facebook
2 | Twitter
SocialNetworkUserName table:
Id | SocialNetworkId | Name
---------------------------------------------------
1 | 2 | #seanssean
2 | 1 | SeanM
In your code, there's some special logic that needs to be carried out for Facebook users. What I usually do is make either an enum or some class constants in the code to easily refer to it, like:
if (socailNetwork.Id == SocialNetwork.FACEBOOK ) // SocialNetwork.FACEBOOK = 1
// special facebook-specific functionality here
That's a hard-coded database ID. It's not a huge crime since it's just referencing a lookup table, but there's no longer a clean division between data and logic, and it bothers me.
The other option I can think of would be to specify the name of a class or delegate in the database, but that's even worse IMO because now you've not only broken the division between data and logic, but you've tied yourself to one language now.
Am I making much ado about nothing?
I don't see the problem.
At some point your code needs to do things. Facebook is a real social network, with its own real API, and you want it to do Facebook-specific things in your code. Unless your tasks are trivial, to put all of the Facebook-specific stuff in the database would mean a headache in your code. (What's the equivalent of "Like" in Twitter, for example?)
If the Facebook entry isn't in your database, then the Facebook-specific code won't be executed. You can do that much.
Yep, but with the caveat that "it depends." It's unlikely to change, but.
Storing the name of a class or delegate is probably bad, but storing a token used by a class or delegate factory isn't, because it's language-neutral--but you'll always have the problem of having to maintain the connection somewhere. Unless you have a table of language-specific things tied to that table, at which point I believe you'd be shot.
Rather than keep the constant comparison in mainline code, IMO this kind of situation is nice for a factory/etc. pattern, enum lookup, etc. to implement network-specific class lookup/behavior. The mainline code shouldn't have to care how it's implemented, which it does right now--that part is a genuine concern.
With the caveat that ultimately it may never matter. If it were me, I'd at least de-couple the mainline code, because stuff like that makes me twitchy.

Names of businesses keyed differently by different people

I have this table
tblStore
with these fields
storeID (autonumber)
storeName
locationOrBranch
and this table
tblPurchased
with these fields
purchasedID
storeID (foreign key)
itemDesc
In the case of stores that have more than one location, there is a problem when two people inadvertently key the same store location differently. For example, take Harrisburg Chevron. On some of its receipts it calls itself Harrisburg Chevron, some just say Chevron at the top, and under that, Harrisburg. One person may key it into tblStore as storeName Chevron, locationoOrBranch Harrisburg. Person2 may key it as storeName Harrisburg Chevron, locationOrBranch Harrisburg. What makes this bad is that the business's name is Harrisburg Chevron. It seems hard to make a rule (that would understandably cover all future opportunities for this error) to prevent people from doing this in the future.
Question 1) I'm thinking as the instances are found, an update query to change all records from one way to the other is the best way to fix it. Is this right?
Questions 2) What would be the best way to have originally set up the db to have avoided this?
Question 3) What can I do to make future after-the-fact corrections easier when this happens?
Thanks.
edit: I do understand that better business practices are the ideal prevention, but for question 2 I'm looking for any tips or tricks that people use that could help. And question 1 and 3 are important to me too.
This is not a database design issue.
This is an issue with the processes around using the database design.
The real question I have is why are users entering in stores ad-hoc? I can think of scenarios, but without knowing your situation it is hard to guess.
The normal solution is that the tblStore table is a lookup table only. Normally users only have access to stores that have already been entered.
Then there is a controlled process to maintain the tblStore table in a consistent manner. Only a few users would have access to this process.
Of course as I alluded to above this is not always possible, so you may need a different solution.
UPDATE:
Question #1: An update script is the best approach. The best way to do this is to have a copy of the database if possible, or a close copy if not, and test the script against this data. Once you have ensured that the script runs correctly, then you can run it against the real data.
If you have transactional integrity you should use that. Use "begin" before running the script and if the number of records is what you expect, and any other tests you devise (perhaps also scripted), then you can "commit"
Do not type in SQL against a live DB.
Question #3: I suggest your first line of attack is to create processes around the creation of new stores, but this may not be wiuthin your ambit.
The second is possibly to get proactive and identify and enter new stores (if this is the problem) before the users in the field need to do so. I don't know if this works inside your scenario.
Lastly if you had a script that merged "store1" into "store2" you can standardise on that as a way of reducing time and errors. You could even possibly build that into an admin only screen that automated merging stores.
That is all I can think of off the top of my head.

Script to copy data from one Informix database to another

I have a need to copy data from one Informix database to another. I do not want to use LOAD for doing this. Is there any script that can help me with this? Is there any other way to do this?
Without a bit more information about the types of Informix databases you have, it's hard to say exactly what the best option is for you.
If it's a small number of tables and large volumes of data, have a look at onunload, onload and/or the High Performance Loader. (I'm assuming we're not talking about Standard Engine here.)
If on the other hand you have lots of tables and HPL will be too fiddly, have a look at myexport/myimport (available on the iiug.org site). These are non-locking equivalents of the standard dbexport/dbimport utilities.
The simplest solution is to backup the database instance and restore it to a separate instance. If this is not possible for you then there are other possibilities.
dbexport/dbimport
unload/load
hand-crafted SQL inserts
If the database structure is identical then you can use dbexport/dbimport, however this will unload the data to flat files, either in the file system or on tape and then import from the flat files.
I generally find that if the DB structure is the same then load/unload is the easiest solution.
If you do not want to use load/unload dbimport/dbexport then you can use direct SQL INSERTS as follows (Untested you will need to check the syntax)
INSERT INTO dbname2#informix_server2:table
SELECT * FROM dbnam1e#informix_server1:table_name
This would of course imply consistent table structure, you could use a column list if the structure is different.
One area that will cause you issues is referential integrity. If you have foreign keys then this will cause you a problem as you will need to ensure the inserts are done in the correct order. You may also have issues with SERIAL columns and INSERTS. Load does not suffer from this problem as you can load into a table with a serial value and retain the original values.
I have often found that the best solution is as follows
Take a schema from database1.
Split it into 2 parts the initial
segment is all table creation
statements, the second parts is all
of the CREATE INDEX, referential
integrity etc statements.
Create database2 from the 1st part of
the schema.
Use UNLOAD/LOAD to load the data into
database2.
Apply the second part of the schema to database2
This is very similar to the process that dbimport goes through but historically I have not been able to use dbimport as my database contains synonyms to another database and dbimport did/does not work with these.
UNLOAD and LOAD are the simplest way of doing it. By precluding them, you preclude the use of DB-Load and DB-Access and DB-Export and DB-Import too. These are the easiest ways to do it.
As already noted, you could consider using HPL.
You could also set up an ER system - it is harder than UNLOAD followed by LOAD, but doesn't use the verboten operations.
If the two machines are substantially identical, you could consider onunload and onload; I would not recommend it.

Obfuscate a SQL Server Db schema

When posting example code or filing bug reports based on a real production app, it would be helpful to have some way to change the table and column names to not potentially give away information about the internals of the app. Doing it by hand without breaking things is time consuming. Does anything automatic exist? Ideally it would use real English words so they are more easily referred to than random text strings.
As long as you don't use real data, I don't see what the issue is. Most apps are fairly obvious based on the requirements. ie CRM system = (customer name, address, etc...) or (customer name, addressid, etc.. with some address table with parts of the address, etc...). By knowing your schema I have no idea how you implement your app. Generally without the stored procedures/program code it would be hard to steal any intellectual property. Even if you were the NSA or something (InternetIP, PacketHeadingID, PacketDetailID, TimeStampID). Even with the structure of the tables I still would have no information on how your system to log all the internet traffic actually works. I also wouldn't know anything that is logged.
I don't know of anything off hand to do what you are requesting, but I would think it is fairly easy to write a script to do it on your own. Look at the table columns and datatypes and call text columns "TextColumn1", int columns "IntColumn2", etc. and build a table of substitutions, then perform the substitutions globally in the script file. I would think this is a fairly easy Python/Perl/PowerShell/Ruby/VbScript program.
I agree that there's no real need to do so, but if you feel that way, take a look at anonymizers, usually used to protect the data and not the schemas, but you could easily apply those approaches to schemas as well.
See this paper (which is the description of this framework) especially page 8 an onwards for different anonymization methods, although replacing column names for static strings might probably be good enough anyway.

Resources