How do I test a code generation tool? - database

I am currently developing a small project of mine that generates SQL calls in a dynamic way to be used by an other software. The SQL calls are not known beforehand and therefore I would like to be able to unit test the object that generates the SQL.
Do you have a clue of how would be the best approach to do this? Bear in mind that there is no possible way to know all the possible SQL calls to be generated.
Currently the only idea I have is to create test cases of the accepted SQL from the db using regex and make sure that the SQL will compile, but this does not ensure that the call returns the expected result.
Edited: Adding more info:
My project is an extension of Boo that will allow the developer to tag his properties with a set of attributes. This attributes are used to identify how the developers wants to store the object in the DB. For example:
# This attribute tells the Boo compiler extension that you want to
# store the object in a MySQL db. The boo compiler extension will make sure that you meet
# the requirements
[Storable(MySQL)]
class MyObject():
# Tells the compiler that name is the PK
[PrimaryKey(Size = 25)]
[Property(Name)]
private name as String
[TableColumn(Size = 25)]
[Property(Surname)]
private surname as String
[TableColumn()]
[Property(Age)]
private age as int
The great idea is that the generated code wont need to use reflection, but that it will added to the class in compile time. Yes the compilation will take longer, but there won't be a need to use Reflection at all. I currently have the code working generating the required methods that returns the SQL at compile time, they are added to the object and can be called but I need to test that the generated SQL is correct :P

The whole point of unit testing is that you know the answer to compare the code results to. You have to find a way to know the SQL calls before hand.
To be honest, as other answerers have suggested, your best approach is to come up with some expected results, and essentially hard-code those in your unit tests. Then you can run your code, obtain the result, and compare against the hard-coded expected value.
Maybe you can record the actual SQL generated, rather than executing it and comparing the results, too?

This seems like a hen-egg situation. You aren't sure what the generator will spit out and you have a moving target to test against (the real database). So you need to tie the loose ends down.
Create a small test database (for example with HSQLDB or Derby). This database should use the same features as the real one, but don't make a copy! You will want to understand what each thing in the test database is for and why it is there, so invest some time to come up with some reasonable test cases. Use your code generator against this (static) test database, save the results as fixed strings in your test cases. Start with a single feature. Don't try to build the perfect test database as step #1. You will get there.
When you change the code generator, run the tests. They should only break in the expected places. If you find a bug, replicate the feature in question in your test database. Create a new test, check the result. Does it look correct? If you can see the error, fix the expected output in the test. After that, fix the generator so it will create the correct result. Close the bug and move on.
This way, you can build more and more safe ground in a swamp. Do something you know, check whether it works (ignore everything else). If you are satisfied, move on. Don't try to tackle all the problems at once. One step at a time. Tests don't forget, so you can forget about everything that is being tested and concentrate on the next feature. The test will make sure that your stable foundation keeps growing until you can erect your skyscraper on it.

regex
I think that the grammar of SQL is non-regular, but context-free; subexpressions being the key to realize this. You may want to write a context-free parser for SQL to check for syntax errors.
But ask yourself: what is it you want to test for? What are your correctness criteria?

If you are generating the code, why not also generate the tests?
Short of that, I would test/debug generated code in the same way you would test/debug any other code without unit tests (i.e. by reading it, running it and/or having it reviewed by others).

You don't have to test all cases. Make a collection of example calls, be sure to include as many of the difficult aspects that the function will have to handle as possible, then look if the generated code is correct.

I would have a suite of tests that put in a known input and check that the generated SQL is as expected.
You're never going to be able to write a test for every scenario but if you write enough to cover at least the most regular patterns you can be fairly confident your generator is working as expected.
If you find it doesn't work in a specific scenario, write another test for that scenario and fix it.

Related

What type of database for storing ML experiments

So I'm thinking to write some small piece of software, which to run/execute ML experiments on a cluster or arbitrary abstracted executor and then save them such that I can view them in real time efficiently. The executor software will have access for writing to the database and will push metrics live. Now, I have not worked too much with databases, thus I'm not sure what is the correct approach for this. Here is a description of what the system should store:
Each experiment will consist of a single piece of code/archive of code such that it can be executed on the remote machine. For now we will assume allow dependencies and etc are installed there. The code will accept command line arguments. The experiment also will consists of a YAML scheme defining the command line arguments. In the code byitself will specify what will be logged in (e.g. I will provide a library in the language for registering channels). Now in terms of logging, you can log numerical values, arrays, text, etc so quite a few types. Each channel will be allowed a single specification (e.g. 2 columns, first int iteration, second float error). The code will also provide special copy of parameters at the end of the experiments.
When one submit an experiments, it will need to provide its unique group name + parameters for execution. This will launch the experiment and log everything.
Implementing this for me is easiest to do with a flat file system. Each project will have a unique name. Each new experiment gets a unique id and folder inside the project. I can store the code there. Each channel gets a file, which for simplicity can be an csv delimeter, with a special schema file describing what type of values are stored there so I can load them there. The final parameters can also be copied in the folder.
However, because of the variety of ways I can do this, and the fact that this might require a separate "table" for each experiment, I have no idea if this is possible in any database systems? Additionally, maybe I'm overseeing something very obvious or maybe not, if you had any experience with this any suggestions/advices are most welcome. The main goal is at the end to be able to serve this to a web interface. Maybe noSQL could accommodate this maybe not (I don't know exactly how those work)?
The data for ML primarily would be unstructured data. That kind of data will not naturally fit into a RDBMS. Essentially a document database like mongodb is far better suited....for such cases.

Sub-queries using XSB Prolog's C API (embedded)

I have a program (C++) that embeds XSB Prolog to use as a constraint solver. I have already written code using the low-level C API to inject facts and run queries. But I am getting stuck on a particular side-problem.
I would like to (for debugging purposes) run a query and then output each term that the query unifies with to a stream. In order to ensure that the outputs are nice, I thought it would be nice to use the output of string:term_to_atom/2 to generate the strings.
So, I'd like to put the query term in register 1, run xsb_query(), and then run string:term_to_atom/2 on the results. But running string:term_to_atom/2 is a query itself, and you can't run xsb_query() when you are in the middle of a query.
I tried using xsb_query_save(), hoping that I could then do a sub-query, followed by xsb_query_restore(), but that doesn't appear to work. The call to my sub-query still bombs out because there is already a query in progress.
I thought about saving a vector of variables created with p2p_new() that have been unified using p2p_unify() with reg_term(1), but I have no idea how or when these terms might get garbage collected, as I see no way for XSB Prolog to know that my C program is using them. (Unless I am supposed to call the undocumented p2p_deref() on them when I am done with them?)
Finally, I would like to do this in a single query (if possible) in order to avoid cluttering up the namespace with what would amount to a temporary rule. But maybe I am trying too hard and I should be using another approach entirely. Ideas?

Is there a way to translate database table rows into Prolog facts?

After doing some research, I was amazed with the power of Prolog to express queries in a very simple way, almost like telling the machine verbally what to do. This happened because I've become really bored with Propel and PHP at work.
So, I've been wondering if there is a way to translate database table rows (Postgres, for example) into Prolog facts. That way, I could stop using so many boring joins and using ORM, and instead write something like this to get what I want:
mantenedora_ies(ID_MANTENEDORA, ID_IES) :-
papel_pessoa(ID_PAPEL_MANTENEDORA, ID_MANTENEDORA, 1),
papel_pessoa(ID_PAPEL_IES, ID_IES, 6),
relacionamento_pessoa(_, ID_PAPEL_IES, ID_PAPEL_MANTENEDORA, 3).
To see why I've become bored, look at this post. The code there would be replaced for these simple lines ahead, much easier to read and understand. I'm just curious about that, since it will be impossible to replace things around here.
It would also be cool if something like that was possible to be done in PHP. Does anyone know something like that?
check the ODBC interface of swi-prolog (maybe there is something equivalent for other prolog implementations too)
http://www.swi-prolog.org/pldoc/doc_for?object=section%280,%270%27,swi%28%27/doc/packages/odbc.html%27%29%29
I can think of a few approaches to this -
On initialization, call a method that performs a selects all data from a table and asserts it into the db. Do this for each db. You will need to declare the shape of each row as :- dynamic ies_row/4 etc
You could modify load_files by overriding user:prolog_load_files. From this activity you could so something similar to #1. This has the benefit of looking like a load_files call. http://www.swi-prolog.org/pldoc/man?predicate=prolog_load_file%2F2 ... This documentation mentions library(http_load), but I cannot find this anywhere (I was interested in this recently)!
There is the Draxler Prolog to SQL compiler, that translates some pattern (like the conjunction you wrote) into the more verbose SQL joins. You can find in the related post (prolog to SQL converter) more info.
But beware that Prolog has its weakness too, especially regarding aggregates. Without a library, getting sums, counts and the like is not very easy. And such libraries aren't so common, and easy to use.
I think you could try to specialize the PHP DB interface for equijoins, using the builtin features that allows to shorten the query text (when this results in more readable code). Working in SWI-Prolog / ODBC, where (like in PHP) you need to compose SQL, I effettively found myself working that way, to handle something very similar to what you have shown in the other post.
Another approach I found useful: I wrote a parser for the subset of SQL used by MySQL backup interface (PHPMyAdmin, really). So routinely I dump locally my CMS' DB, load it memory, apply whathever duty task I need, computing and writing (or applying) the insert/update/delete statements, then upload these. This can be done due to the limited size of the DB, that fits in memory. I've developed and now I'm mantaining this small e-commerce with this naive approach.
Writing Prolog from PHP should be not too much difficult: I'd try to modify an existing interface, like the awesome Adminer, that already offers a choice among basic serialization formats.

Unit testing a select query

I am returning a set of rows, each representing a desktop machine.
I am stumped on finding a way to unit test this. There's not really any edge cases or criteria I can think of, to test. It's not like share prices where I might want to check I am getting data which is indeed 5 months old. It's not like storing person details where you could check that a certain length always works, or special characters, etc. Or currency and different currencies (£, $, etc) as strings.
How would I test this sort of resultset?
Also, in testing the returnset of a query, there are a few problems:
1) Testing you have the same number of rows as when you run the query on the server is brittle because someone might change the table data. Is this when you have a test server, which nobody changes unless they upload change scripts?
2) Do you test the dataset object is not null? So if it's instantiated as null, but is not after the query's executed, it's holding value (this doesn't prove the data is correct, just that data has been retrieved).
Thanks
You can use a component like NBuilder that will simulate your database. And as you can manage all the aspects of your dataset you can test several aspects of the database interaction: the number of records your query returns, the range of values in some field. And, because the dataset is always created with the arguments you choose the data are always the same so you can reproduce your tests completly decoupled from your database.
1 -
a) Never ever test against the production server, for any reason.
b) Tests should start from a known configuration, which you can achieve either with mock objects or with a test database (some might argue that unit tests should use mock objects and integration tests should use a test database).
As for test cases, start with a basic functionality test - put a "normal" row in and make sure you get it back. You'll appreciate having that test if you later refactor. Make sure the program responds correctly to columns being null, or blank. Put the maximum and minimum values in all the DB fields and make sure the object fields you're storing them in can fit that resolution. Check duplicate records in the DB, or missing records. If you have production data, grab a snapshot of it to put in your test DB and make sure that loads correctly. Is there a value that chronically causes difficulties in other parts of the program? Check it here too. And once you release the code, add to the test list any values you find in production that break the system (Regression testing).

Find redundant pages, files, stored procedures in legacy applications

I have several horrors of old ASP web applications. Does anyone have any easy ways to find what scripts, pages, and stored procedures are no longer needed? (besides the stuff in "old___code", "delete_this", etc ;-)
Chances are if the stored proc won't run, it isn't being used because nobody ever bothered to update it when sonmething else changed. Table colunms that are null for every single record are probably not being used.
If you have your sp and database objects in source control (and if you don't why don't you?), you might be able to reaach through and find what other code it was moved to production with which should give you a clue as to what might call it. YOu will also be able to see who touched it last and that person might know if it is still needed.
I generally approach this by first listing all the procs (you can get this from the system tables) and then marking the ones I know are being used off the list. Profiler can help you here as you can see which are commonly being called. (But don't assume that because profiler didn't show the proc that it isn't being used, that just gives you a list of the ones to research.) This makes the ones that need to be rearched a much smaller list. Depending on your naming convention it might be relatively easy to see what part of the code should use them. When researching don't forget that procs are called in places other than the application, so you will need to check through jobs, DTS or SSIS packages, SSRS reports, other applications, triggers etc to be sure something is not being used.
Once you have identified a list of ones you don't think you need, share it with the rest of the development staff and ask if anyone knows if the proc is needed. You'll probably get a a couple more taken off the list this way that are used for something specialized. Then when you have the list, change the names to some convention that allows you to identify them as a candidate for deletion. At the same time set a deletion date (how far out that date is depends on how often something might be called, if it is called something like AnnualXYZReport, then make that date a year out). If no one complains by the deletion date, delete the proc (of course if it is in source control you can alawys get it back even then).
Onnce you have gone through the hell of identifying the bad ones, then it is time to realize you need to train people that part of the development process is to identify procs that are no longer being used and get rid of them as part of a change to a section of code. Depending on code reuse, this may mean searching the code base to see if someother part of the code base uses it and then doing the same thing discussed as above, let everyone know it will be deleted on this date, change the name so that any code referncing it will break and then on the date to delete getting rid of it. Or maybe you can have a meta data table where you can put candidates for deletion at the time you know that you have stopped using something and send a report around to everyone once a month or so to determine if anyone else needs it.
I can't think of any easy way to do this, it's just a matter of identifying what might not be used and slogging through.
For SQL Server only, 3 options that I can think of:
modify the stored procs to log usage
check if code has no permissions set
run profiler
And of course, remove access or delete it and see who calls...

Resources