Sub-queries using XSB Prolog's C API (embedded) - c

I have a program (C++) that embeds XSB Prolog to use as a constraint solver. I have already written code using the low-level C API to inject facts and run queries. But I am getting stuck on a particular side-problem.
I would like to (for debugging purposes) run a query and then output each term that the query unifies with to a stream. In order to ensure that the outputs are nice, I thought it would be nice to use the output of string:term_to_atom/2 to generate the strings.
So, I'd like to put the query term in register 1, run xsb_query(), and then run string:term_to_atom/2 on the results. But running string:term_to_atom/2 is a query itself, and you can't run xsb_query() when you are in the middle of a query.
I tried using xsb_query_save(), hoping that I could then do a sub-query, followed by xsb_query_restore(), but that doesn't appear to work. The call to my sub-query still bombs out because there is already a query in progress.
I thought about saving a vector of variables created with p2p_new() that have been unified using p2p_unify() with reg_term(1), but I have no idea how or when these terms might get garbage collected, as I see no way for XSB Prolog to know that my C program is using them. (Unless I am supposed to call the undocumented p2p_deref() on them when I am done with them?)
Finally, I would like to do this in a single query (if possible) in order to avoid cluttering up the namespace with what would amount to a temporary rule. But maybe I am trying too hard and I should be using another approach entirely. Ideas?

Related

Is there a way to split Flink tasks in cluster gui besides using .startNewChain()?

The Flink Cluster's GUI is really beneficial to me, particularly the plan of the job so that you can see the number of records sent from what part of the job to the other. But an issue that I have come across is that if you don't use .startNewChain() in between functions, the data that is said to be sent from one function to another is misleading.
To give an example:
Here, the code is using .startNewChain() on the finalErrorOutputStream.
When this is ran in the cluster, the gui displays this:
The outputDataStream has output 10,476 records and the finalErrorOutputStream is shown as a separate task(not sure if "task" is technically the right term, but it is what I am calling it) that shows it has received 8,860 records.
Now if we remove the .startNewChain() from the finalErrorOutputStream, we get this in the gui:
The outputDataStream has output 10,507 records and we don't know how many of those have gone to the finalErrorOutputStream (yes, we could set up graphs in the task metrics tab, but the goal is to just be able to tell from this standard overview) and because there is a sink for finalErrorOutputStream, it shows like finalErrorOutputStream hasn't output any records. If you were to show this to someone unfamiliar with Flink and the reasoning for this it would be confusing.
So using .startNewChain() is a better solution to show the breakdown of how many records went to where, BUT the issue is that .startNewChain() does cause a performance impact.
And there are some jobs that I have seen made where if you don't use .startNewChain() it is just a single square in the job's plan even though a lot is going on inside of it.
So my question is if .startNewChain() is the only way to get this behavior or if there is some other option that will provide that insight into the job's "plan"?
Unfortunately no, there is no other way. The GUI, and the backing data-structures it reads the data from, are only aware of tasks (you used the right term), not operators (the individual parts of a task).

Is possible to have [FMResultSet previous] and/or [FMResultSet goRecord:1]?

To use sqlite for populate UITableViews, I copy all the data to NSArrays/Dicts so I can travel back/forward on the list. However, this is wastefull and requiere to have the data twice.
I wonder if is possible to have [FMResultSet previous] and/or [FMResultSet goRecord:RECNO] so I can get rid of the copy and use a FMResultSet directly.
So the sqlite API provide this possibility?
There is a nice article that Richard Hipp, the create of SQLite, has written about why SQLite does not have the functionality of stepping backwards, and why this is a very hard problem in-general. The article also offers some workarounds that you might find interesting.
Quoting from that article,
The sqlite3_step() function above seems to advance a cursor forwards through the result set. It is natural to then ask why there is not a corresponding sqlite3_step_backwards() function to move backwards through the result set. It seems like sqlite3_step_backwards() should be easy to implement, after all. Just step backwards through the result set instead of forwards...
But it is not easy at all. In fact, the designers of SQLite have been unable to think of an algorithm that will do the job in the general case. Stepping backwards through the result set is easy for some special cases, such as the simple example query above. But things get more complicated, for example, if the query were really a 4-way LEFT OUTER JOIN with subqueries both in the result columns and in the WHERE clause.
The problem is that the sqlite3_step() function does not step through a precomputed result set at all. A better and more realistic way to think about matters is to suppose that each prepared statement is really a computer program. You are running this program in a debugger and there is a breakpoint set on a single statement somewhere deep down inside the computation. Calling the sqlite3_step() function is like pressing the "Run" button in your debugger and thus asking the debugger to run the program until it either exits or hits the breakpoint. Sqlite3_step() returns SQLITE_ROW if it hits the breakpoint and SQLITE_DONE if it finishes. If you hit the breakpoint, you can then look at local variable in order to find the values of a "row". Then you can press the "Run" button (call sqlite3_step()) again to continue execution until the next breakpoint or until the program exits.
From this point of view (which is much closer to how SQLite works on the inside) asking for an sqlite3_step_backward() button is really like expecting your symbolic debugger to be able to run backwards or to "undo" its execution back to the previous breakpoint. Nobody reasonably expects debuggers to be able to do this, so you shouldn't expect SQLite to be able to sqlite3_step_backward() either.

Achieving high-performance transactions when extending PostgreSQL with C-functions

My goal is to achieve the highest performance available for copying a block of data from the database into a C-function to be processed and returned as the result of a query.
I am new to PostgreSQL and I am currently researching possible ways to move the data. Specifically, I am looking for nuances or keywords related specifically to PostgreSQL to move big data fast.
NOTE:
My ultimate goal is speed, so I am willing to accept answers outside of the exact question I have posed as long as it gets big performance results. For example, I have come across the COPY keyword (PostgreSQL only), which moves data from tables to files quickly; and vice versa. I am trying to stay away from processing that is external to the database, but if it provides a performance improvement that out-weighs the obvious drawback of external processing, then so be it.
It sounds like you probably want to use the server programming interface (SPI) to implement a stored procedure as a C language function running inside the PostgreSQL back-end.
Use SPI_connect to set up the SPI.
Now SPI_prepare_cursor a query, then SPI_cursor_open it. SPI_cursor_fetch rows from it and SPI_cursor_close it when done. Note that SPI_cursor_fetch allows you to fetch batches of rows.
SPI_finish to clean up when done.
You can return the result rows into a tuplestore as you generate them, avoiding the need to build the whole table in memory. See examples in any of the set-returning functions in the PostgreSQL source code. You might also want to look at the SPI_returntuple helper function.
See also: C language functions and extending SQL.
If maximum speed is of interest, your client may want to use the libpq binary protocol via libpqtypes so it receives the data produced by your server-side SPI-using procedure with minimal overhead.

Is there a way to translate database table rows into Prolog facts?

After doing some research, I was amazed with the power of Prolog to express queries in a very simple way, almost like telling the machine verbally what to do. This happened because I've become really bored with Propel and PHP at work.
So, I've been wondering if there is a way to translate database table rows (Postgres, for example) into Prolog facts. That way, I could stop using so many boring joins and using ORM, and instead write something like this to get what I want:
mantenedora_ies(ID_MANTENEDORA, ID_IES) :-
papel_pessoa(ID_PAPEL_MANTENEDORA, ID_MANTENEDORA, 1),
papel_pessoa(ID_PAPEL_IES, ID_IES, 6),
relacionamento_pessoa(_, ID_PAPEL_IES, ID_PAPEL_MANTENEDORA, 3).
To see why I've become bored, look at this post. The code there would be replaced for these simple lines ahead, much easier to read and understand. I'm just curious about that, since it will be impossible to replace things around here.
It would also be cool if something like that was possible to be done in PHP. Does anyone know something like that?
check the ODBC interface of swi-prolog (maybe there is something equivalent for other prolog implementations too)
http://www.swi-prolog.org/pldoc/doc_for?object=section%280,%270%27,swi%28%27/doc/packages/odbc.html%27%29%29
I can think of a few approaches to this -
On initialization, call a method that performs a selects all data from a table and asserts it into the db. Do this for each db. You will need to declare the shape of each row as :- dynamic ies_row/4 etc
You could modify load_files by overriding user:prolog_load_files. From this activity you could so something similar to #1. This has the benefit of looking like a load_files call. http://www.swi-prolog.org/pldoc/man?predicate=prolog_load_file%2F2 ... This documentation mentions library(http_load), but I cannot find this anywhere (I was interested in this recently)!
There is the Draxler Prolog to SQL compiler, that translates some pattern (like the conjunction you wrote) into the more verbose SQL joins. You can find in the related post (prolog to SQL converter) more info.
But beware that Prolog has its weakness too, especially regarding aggregates. Without a library, getting sums, counts and the like is not very easy. And such libraries aren't so common, and easy to use.
I think you could try to specialize the PHP DB interface for equijoins, using the builtin features that allows to shorten the query text (when this results in more readable code). Working in SWI-Prolog / ODBC, where (like in PHP) you need to compose SQL, I effettively found myself working that way, to handle something very similar to what you have shown in the other post.
Another approach I found useful: I wrote a parser for the subset of SQL used by MySQL backup interface (PHPMyAdmin, really). So routinely I dump locally my CMS' DB, load it memory, apply whathever duty task I need, computing and writing (or applying) the insert/update/delete statements, then upload these. This can be done due to the limited size of the DB, that fits in memory. I've developed and now I'm mantaining this small e-commerce with this naive approach.
Writing Prolog from PHP should be not too much difficult: I'd try to modify an existing interface, like the awesome Adminer, that already offers a choice among basic serialization formats.

How do I test a code generation tool?

I am currently developing a small project of mine that generates SQL calls in a dynamic way to be used by an other software. The SQL calls are not known beforehand and therefore I would like to be able to unit test the object that generates the SQL.
Do you have a clue of how would be the best approach to do this? Bear in mind that there is no possible way to know all the possible SQL calls to be generated.
Currently the only idea I have is to create test cases of the accepted SQL from the db using regex and make sure that the SQL will compile, but this does not ensure that the call returns the expected result.
Edited: Adding more info:
My project is an extension of Boo that will allow the developer to tag his properties with a set of attributes. This attributes are used to identify how the developers wants to store the object in the DB. For example:
# This attribute tells the Boo compiler extension that you want to
# store the object in a MySQL db. The boo compiler extension will make sure that you meet
# the requirements
[Storable(MySQL)]
class MyObject():
# Tells the compiler that name is the PK
[PrimaryKey(Size = 25)]
[Property(Name)]
private name as String
[TableColumn(Size = 25)]
[Property(Surname)]
private surname as String
[TableColumn()]
[Property(Age)]
private age as int
The great idea is that the generated code wont need to use reflection, but that it will added to the class in compile time. Yes the compilation will take longer, but there won't be a need to use Reflection at all. I currently have the code working generating the required methods that returns the SQL at compile time, they are added to the object and can be called but I need to test that the generated SQL is correct :P
The whole point of unit testing is that you know the answer to compare the code results to. You have to find a way to know the SQL calls before hand.
To be honest, as other answerers have suggested, your best approach is to come up with some expected results, and essentially hard-code those in your unit tests. Then you can run your code, obtain the result, and compare against the hard-coded expected value.
Maybe you can record the actual SQL generated, rather than executing it and comparing the results, too?
This seems like a hen-egg situation. You aren't sure what the generator will spit out and you have a moving target to test against (the real database). So you need to tie the loose ends down.
Create a small test database (for example with HSQLDB or Derby). This database should use the same features as the real one, but don't make a copy! You will want to understand what each thing in the test database is for and why it is there, so invest some time to come up with some reasonable test cases. Use your code generator against this (static) test database, save the results as fixed strings in your test cases. Start with a single feature. Don't try to build the perfect test database as step #1. You will get there.
When you change the code generator, run the tests. They should only break in the expected places. If you find a bug, replicate the feature in question in your test database. Create a new test, check the result. Does it look correct? If you can see the error, fix the expected output in the test. After that, fix the generator so it will create the correct result. Close the bug and move on.
This way, you can build more and more safe ground in a swamp. Do something you know, check whether it works (ignore everything else). If you are satisfied, move on. Don't try to tackle all the problems at once. One step at a time. Tests don't forget, so you can forget about everything that is being tested and concentrate on the next feature. The test will make sure that your stable foundation keeps growing until you can erect your skyscraper on it.
regex
I think that the grammar of SQL is non-regular, but context-free; subexpressions being the key to realize this. You may want to write a context-free parser for SQL to check for syntax errors.
But ask yourself: what is it you want to test for? What are your correctness criteria?
If you are generating the code, why not also generate the tests?
Short of that, I would test/debug generated code in the same way you would test/debug any other code without unit tests (i.e. by reading it, running it and/or having it reviewed by others).
You don't have to test all cases. Make a collection of example calls, be sure to include as many of the difficult aspects that the function will have to handle as possible, then look if the generated code is correct.
I would have a suite of tests that put in a known input and check that the generated SQL is as expected.
You're never going to be able to write a test for every scenario but if you write enough to cover at least the most regular patterns you can be fairly confident your generator is working as expected.
If you find it doesn't work in a specific scenario, write another test for that scenario and fix it.

Resources