Is there a way to translate database table rows into Prolog facts? - database

After doing some research, I was amazed with the power of Prolog to express queries in a very simple way, almost like telling the machine verbally what to do. This happened because I've become really bored with Propel and PHP at work.
So, I've been wondering if there is a way to translate database table rows (Postgres, for example) into Prolog facts. That way, I could stop using so many boring joins and using ORM, and instead write something like this to get what I want:
mantenedora_ies(ID_MANTENEDORA, ID_IES) :-
papel_pessoa(ID_PAPEL_MANTENEDORA, ID_MANTENEDORA, 1),
papel_pessoa(ID_PAPEL_IES, ID_IES, 6),
relacionamento_pessoa(_, ID_PAPEL_IES, ID_PAPEL_MANTENEDORA, 3).
To see why I've become bored, look at this post. The code there would be replaced for these simple lines ahead, much easier to read and understand. I'm just curious about that, since it will be impossible to replace things around here.
It would also be cool if something like that was possible to be done in PHP. Does anyone know something like that?

check the ODBC interface of swi-prolog (maybe there is something equivalent for other prolog implementations too)
http://www.swi-prolog.org/pldoc/doc_for?object=section%280,%270%27,swi%28%27/doc/packages/odbc.html%27%29%29

I can think of a few approaches to this -
On initialization, call a method that performs a selects all data from a table and asserts it into the db. Do this for each db. You will need to declare the shape of each row as :- dynamic ies_row/4 etc
You could modify load_files by overriding user:prolog_load_files. From this activity you could so something similar to #1. This has the benefit of looking like a load_files call. http://www.swi-prolog.org/pldoc/man?predicate=prolog_load_file%2F2 ... This documentation mentions library(http_load), but I cannot find this anywhere (I was interested in this recently)!

There is the Draxler Prolog to SQL compiler, that translates some pattern (like the conjunction you wrote) into the more verbose SQL joins. You can find in the related post (prolog to SQL converter) more info.
But beware that Prolog has its weakness too, especially regarding aggregates. Without a library, getting sums, counts and the like is not very easy. And such libraries aren't so common, and easy to use.
I think you could try to specialize the PHP DB interface for equijoins, using the builtin features that allows to shorten the query text (when this results in more readable code). Working in SWI-Prolog / ODBC, where (like in PHP) you need to compose SQL, I effettively found myself working that way, to handle something very similar to what you have shown in the other post.
Another approach I found useful: I wrote a parser for the subset of SQL used by MySQL backup interface (PHPMyAdmin, really). So routinely I dump locally my CMS' DB, load it memory, apply whathever duty task I need, computing and writing (or applying) the insert/update/delete statements, then upload these. This can be done due to the limited size of the DB, that fits in memory. I've developed and now I'm mantaining this small e-commerce with this naive approach.
Writing Prolog from PHP should be not too much difficult: I'd try to modify an existing interface, like the awesome Adminer, that already offers a choice among basic serialization formats.

Related

JOOQ or alternatives for code reduction in generated classes

For a bigger project, for example 100+ tables, size of the code (therefore classes and functions needed/not needed) is critical. Here comes my question: what is the best way to reduce code as much as possible when using JOOQ for class generation or are there any alternatives of generating them as efficient as possible?
I know one option is the include/exclude such as:
<excludes>
TABLE
|DATA.*
</excludes>
This reduces automatically the code by eliminating unneeded tables/routines/etc.
Are there any other possibilities or better solution to do so? Is that it? Better said, can I reduce the code even more?
From your comments, I take that you are really keen on avoiding pretty much every line of code that you deem unnecessary, perhaps even including generated Javadoc.
This has not been a popular use case for any jOOQ user so far, which is why there aren't many means of achieving what you want through out of the box functionality. As you've already discovered, you can reduce the number of objects (e.g. tables) being included, as well as the object types (e.g. tables, procedures, sequences, etc.), but you cannot really influence the layout of generated code yet in jOOQ 3.x.
This means you'll have to roll your own. Either:
Implement your own code generator, taking inspiration from the JavaGenerator
Write the "generated" classes manually, taking inspiration from the JavaGenerator's output

Convinience for postgresql C custom function Vs plpgsql

I state that my answer to the object question is Yes in my case is convinient but I ask here to the expert.
I developed a lot of plpgsql functions and just one in C but I already understood that the learning curve is definitely more sloped.
In may case I need a real developing language that plpgsql sometimes is not, but also I need performance otherwise I'd looked at python.
But here the question.
Mainly I need to retrieve data with some select and join, make elaboration on them, sametimes complex and return a table of data.
From the time of execution point of view is quicker a c function for this kind of use?
I apreciate any comment
luca
But here the question. Mainly I need to retrieve data with some select and join, make elaboration on them, sametimes complex and return a table of data.
I would go with pl/pgsql for this, as that's what it is designed for. In general, pl/pgsql performs very well within its problem domain, and I doubt you are likely to get significantly better performance by going with C. To the extent you can push your elaborations into the main query, all the better performance-wise.
This is assuming that your elaborations can be done with existing functions and not a huge amount of complex data manipulation (in particular, say, converting between datatypes, like arrays and sets). If that is the case, I would still put the main query and light manipulation in the pl/pgsql, and put the specific operations that need to be tuned in C. There are two reasons for doing this:
It means less C code, which means the C code is easier to read, follow, and prove correct.
It separates concerns so that you can use similar manipulations elsewhere.
There's a lot of performance tuning that has gone into pl/pgsql for its problem domain and reinventing all of that would be a lot of work both in development and testing. To the extent you can leverage tools that are already there you can get the performance you need with a lot less effort and a lot more in the way of guarantees.
EDIT
If you want to write PL/PGSQL code that performs well, you want to have it be a large main query with modest support logic. The more you can push into your query the better, and the more of your elaborations you can do in SQL (with possible C functions as mentioned above), the better. Not only does this mean better performance but it means better maintainability. As ArtemGr mentioned, certain operations are very expensive in PL/PGSQL. and in these cases you want to supplement with C code in order to get the performance you need.
I know C/C++ well and for me it's easier to write a PostgreSQL function in C++ than to learn the intricacies of pgSQL syntax and workaround its limitations. I'd say go with the language you (and the rest of your team) are more familiar with. C should be faster than pgSQL (and Tcl, Perl, Python) for complex data manipulation. Usually 5-10 times faster. Javascript (http://code.google.com/p/plv8js/) might be nearly as fast as C if it has a chance to spin it's JIT. Python code can actually use a Cython extension under the hood which might be nearly as fast as C.
You should probably measure how much time is spent in the data manipulation in question and relative to the time spent in the I/O before making a decision. In some domains C isn't faster, for example Tcl and Javascript has very good regular expression engines.

Achieving high-performance transactions when extending PostgreSQL with C-functions

My goal is to achieve the highest performance available for copying a block of data from the database into a C-function to be processed and returned as the result of a query.
I am new to PostgreSQL and I am currently researching possible ways to move the data. Specifically, I am looking for nuances or keywords related specifically to PostgreSQL to move big data fast.
NOTE:
My ultimate goal is speed, so I am willing to accept answers outside of the exact question I have posed as long as it gets big performance results. For example, I have come across the COPY keyword (PostgreSQL only), which moves data from tables to files quickly; and vice versa. I am trying to stay away from processing that is external to the database, but if it provides a performance improvement that out-weighs the obvious drawback of external processing, then so be it.
It sounds like you probably want to use the server programming interface (SPI) to implement a stored procedure as a C language function running inside the PostgreSQL back-end.
Use SPI_connect to set up the SPI.
Now SPI_prepare_cursor a query, then SPI_cursor_open it. SPI_cursor_fetch rows from it and SPI_cursor_close it when done. Note that SPI_cursor_fetch allows you to fetch batches of rows.
SPI_finish to clean up when done.
You can return the result rows into a tuplestore as you generate them, avoiding the need to build the whole table in memory. See examples in any of the set-returning functions in the PostgreSQL source code. You might also want to look at the SPI_returntuple helper function.
See also: C language functions and extending SQL.
If maximum speed is of interest, your client may want to use the libpq binary protocol via libpqtypes so it receives the data produced by your server-side SPI-using procedure with minimal overhead.

Template Matching for relational database

I am trying to do the following:
we are trying to design a fraud detection system for stock market.
I know the Specification for the frauds (they are like templates).
so I want to know if I can design a template, and find all records that match this template.
Notice:
I can't use the traditional queries cause the templates are complex
for example one of my Fraud is circular trading,it's like this :
A bought from B, and B bought from C, And C bought from A (it's a cycle)
and this cycle can include 4 or 5 persons.
is there any good suggestion for this situation.
I don't see why you can't use "traditional queries" as you've stated. SQL can be used to write extraordinarily complex queries. For that matter I'm not sure that this is a hugely challenging question.
Firstly, I'd look at the behavior you have described as vary transactional, therefore I treat the transactions as a model. I'd likely have a transactions table with some columns like buyer, seller, amount, etc...
You could alternatively have the shares as its own table and store say the previous 100 owners of that share in the same table using STI (Single Table Inheritance) buy putting all the primary keys of the owners into an "owners" column in your shares table like 234/823/12334/1234/... that way you can do complex queries and see if that share was owned by the same person or look for patterns in the string really easily and quickly.
-update-
I wouldn't suggest making up a "small language" I don't see why you'd want to do something like that when you have huge selection of wonderful languages and databases to choose from, all of which have well refined and tested methods to solve exactly what you are doing.
My best advice is pop open your IDE (thumbs up for TextMate) and pick your favorite language (Ruby in my case). Find some sample data and create your database and start writing some code! You can't go wrong trying to experiment like this, it'll will totally expose better ways to go about it than we can dream up here on Stackoverflow.
Definitely Data Mining. But as you point out, you've already got the models (your templates). Look up fraud DETECTION rather than prevention for better search results?
I know a some banks use SPSS PASW Modeler for fraud detection. This is very intuitive and you can see what you are doing as you play around with the data. So you can implement your templates. I agree with Joseph, you need to get playing, making some new data structures.
Maybe a timeseries model?
Theoretically you could develop a "Small Language" first, something with a simple syntax (that makes expressing the domain - in your case fraud patterns - easy) and from it generate one or more SQL queries.
As most solutions, this could be thought of as a slider: at one extreme there is the "full Fraud Detection Language" at the other, you could just build stored procedures for the most common cases, and write new stored procedures which use the more "basic" blocks you wrote before to implement the various patterns.
What you are trying to do falls under the Data Mining umbrella, so you could also try to learn more about it: maybe you can find a Data Mining package for your specific DB (you didn't specify) and see if it helps you finding common patterns in your data.

Obfuscate a SQL Server Db schema

When posting example code or filing bug reports based on a real production app, it would be helpful to have some way to change the table and column names to not potentially give away information about the internals of the app. Doing it by hand without breaking things is time consuming. Does anything automatic exist? Ideally it would use real English words so they are more easily referred to than random text strings.
As long as you don't use real data, I don't see what the issue is. Most apps are fairly obvious based on the requirements. ie CRM system = (customer name, address, etc...) or (customer name, addressid, etc.. with some address table with parts of the address, etc...). By knowing your schema I have no idea how you implement your app. Generally without the stored procedures/program code it would be hard to steal any intellectual property. Even if you were the NSA or something (InternetIP, PacketHeadingID, PacketDetailID, TimeStampID). Even with the structure of the tables I still would have no information on how your system to log all the internet traffic actually works. I also wouldn't know anything that is logged.
I don't know of anything off hand to do what you are requesting, but I would think it is fairly easy to write a script to do it on your own. Look at the table columns and datatypes and call text columns "TextColumn1", int columns "IntColumn2", etc. and build a table of substitutions, then perform the substitutions globally in the script file. I would think this is a fairly easy Python/Perl/PowerShell/Ruby/VbScript program.
I agree that there's no real need to do so, but if you feel that way, take a look at anonymizers, usually used to protect the data and not the schemas, but you could easily apply those approaches to schemas as well.
See this paper (which is the description of this framework) especially page 8 an onwards for different anonymization methods, although replacing column names for static strings might probably be good enough anyway.

Resources