Why is the CommandID in PostgreSQL needed - database

I am struggling to understand why the CommandId (documented here) is necessary in PostgreSQL. The CommandId is sometimes also called cmin and cmax.
I understand that the Transaction ID (xmin/xmax) is necessary. However the cmin/cmax values are documented to only be relevant for the current transaction.
I have been looking around pretty much everywhere, but even the .c/.h files in the PostgreSQL code base do not talk about it a lot.

Related

Achieving high-performance transactions when extending PostgreSQL with C-functions

My goal is to achieve the highest performance available for copying a block of data from the database into a C-function to be processed and returned as the result of a query.
I am new to PostgreSQL and I am currently researching possible ways to move the data. Specifically, I am looking for nuances or keywords related specifically to PostgreSQL to move big data fast.
NOTE:
My ultimate goal is speed, so I am willing to accept answers outside of the exact question I have posed as long as it gets big performance results. For example, I have come across the COPY keyword (PostgreSQL only), which moves data from tables to files quickly; and vice versa. I am trying to stay away from processing that is external to the database, but if it provides a performance improvement that out-weighs the obvious drawback of external processing, then so be it.
It sounds like you probably want to use the server programming interface (SPI) to implement a stored procedure as a C language function running inside the PostgreSQL back-end.
Use SPI_connect to set up the SPI.
Now SPI_prepare_cursor a query, then SPI_cursor_open it. SPI_cursor_fetch rows from it and SPI_cursor_close it when done. Note that SPI_cursor_fetch allows you to fetch batches of rows.
SPI_finish to clean up when done.
You can return the result rows into a tuplestore as you generate them, avoiding the need to build the whole table in memory. See examples in any of the set-returning functions in the PostgreSQL source code. You might also want to look at the SPI_returntuple helper function.
See also: C language functions and extending SQL.
If maximum speed is of interest, your client may want to use the libpq binary protocol via libpqtypes so it receives the data produced by your server-side SPI-using procedure with minimal overhead.

Is there a way to translate database table rows into Prolog facts?

After doing some research, I was amazed with the power of Prolog to express queries in a very simple way, almost like telling the machine verbally what to do. This happened because I've become really bored with Propel and PHP at work.
So, I've been wondering if there is a way to translate database table rows (Postgres, for example) into Prolog facts. That way, I could stop using so many boring joins and using ORM, and instead write something like this to get what I want:
mantenedora_ies(ID_MANTENEDORA, ID_IES) :-
papel_pessoa(ID_PAPEL_MANTENEDORA, ID_MANTENEDORA, 1),
papel_pessoa(ID_PAPEL_IES, ID_IES, 6),
relacionamento_pessoa(_, ID_PAPEL_IES, ID_PAPEL_MANTENEDORA, 3).
To see why I've become bored, look at this post. The code there would be replaced for these simple lines ahead, much easier to read and understand. I'm just curious about that, since it will be impossible to replace things around here.
It would also be cool if something like that was possible to be done in PHP. Does anyone know something like that?
check the ODBC interface of swi-prolog (maybe there is something equivalent for other prolog implementations too)
http://www.swi-prolog.org/pldoc/doc_for?object=section%280,%270%27,swi%28%27/doc/packages/odbc.html%27%29%29
I can think of a few approaches to this -
On initialization, call a method that performs a selects all data from a table and asserts it into the db. Do this for each db. You will need to declare the shape of each row as :- dynamic ies_row/4 etc
You could modify load_files by overriding user:prolog_load_files. From this activity you could so something similar to #1. This has the benefit of looking like a load_files call. http://www.swi-prolog.org/pldoc/man?predicate=prolog_load_file%2F2 ... This documentation mentions library(http_load), but I cannot find this anywhere (I was interested in this recently)!
There is the Draxler Prolog to SQL compiler, that translates some pattern (like the conjunction you wrote) into the more verbose SQL joins. You can find in the related post (prolog to SQL converter) more info.
But beware that Prolog has its weakness too, especially regarding aggregates. Without a library, getting sums, counts and the like is not very easy. And such libraries aren't so common, and easy to use.
I think you could try to specialize the PHP DB interface for equijoins, using the builtin features that allows to shorten the query text (when this results in more readable code). Working in SWI-Prolog / ODBC, where (like in PHP) you need to compose SQL, I effettively found myself working that way, to handle something very similar to what you have shown in the other post.
Another approach I found useful: I wrote a parser for the subset of SQL used by MySQL backup interface (PHPMyAdmin, really). So routinely I dump locally my CMS' DB, load it memory, apply whathever duty task I need, computing and writing (or applying) the insert/update/delete statements, then upload these. This can be done due to the limited size of the DB, that fits in memory. I've developed and now I'm mantaining this small e-commerce with this naive approach.
Writing Prolog from PHP should be not too much difficult: I'd try to modify an existing interface, like the awesome Adminer, that already offers a choice among basic serialization formats.

Oracle equivalent to Ingres inquire_sql or inquire_ingres?

I'm converting some legacy embedded-Ingres C code to work against Oracle. I've found references to functions "inquire_ingres()" and "inquire_sql()," which, per the docs at http://docs.ingres.com/ingres/9.3/sql-reference-guide/2275-inquiresql-function, allow a program to gather runtime information about the status and results of the last SQL statement that the program issued.
Does Oracle provide similar convenience functionality, or am I going to have to just paw around some more in the innards of sqlca as I suspect I'm going to?
It looks like the answer is: you have to paw around in the innards of sqlca. There's a lot of good information buried in that struct though -- check out http://infolab.stanford.edu/~ullman/fcdb/oracle/or-proc.html#sqlca for some details.

Find redundant pages, files, stored procedures in legacy applications

I have several horrors of old ASP web applications. Does anyone have any easy ways to find what scripts, pages, and stored procedures are no longer needed? (besides the stuff in "old___code", "delete_this", etc ;-)
Chances are if the stored proc won't run, it isn't being used because nobody ever bothered to update it when sonmething else changed. Table colunms that are null for every single record are probably not being used.
If you have your sp and database objects in source control (and if you don't why don't you?), you might be able to reaach through and find what other code it was moved to production with which should give you a clue as to what might call it. YOu will also be able to see who touched it last and that person might know if it is still needed.
I generally approach this by first listing all the procs (you can get this from the system tables) and then marking the ones I know are being used off the list. Profiler can help you here as you can see which are commonly being called. (But don't assume that because profiler didn't show the proc that it isn't being used, that just gives you a list of the ones to research.) This makes the ones that need to be rearched a much smaller list. Depending on your naming convention it might be relatively easy to see what part of the code should use them. When researching don't forget that procs are called in places other than the application, so you will need to check through jobs, DTS or SSIS packages, SSRS reports, other applications, triggers etc to be sure something is not being used.
Once you have identified a list of ones you don't think you need, share it with the rest of the development staff and ask if anyone knows if the proc is needed. You'll probably get a a couple more taken off the list this way that are used for something specialized. Then when you have the list, change the names to some convention that allows you to identify them as a candidate for deletion. At the same time set a deletion date (how far out that date is depends on how often something might be called, if it is called something like AnnualXYZReport, then make that date a year out). If no one complains by the deletion date, delete the proc (of course if it is in source control you can alawys get it back even then).
Onnce you have gone through the hell of identifying the bad ones, then it is time to realize you need to train people that part of the development process is to identify procs that are no longer being used and get rid of them as part of a change to a section of code. Depending on code reuse, this may mean searching the code base to see if someother part of the code base uses it and then doing the same thing discussed as above, let everyone know it will be deleted on this date, change the name so that any code referncing it will break and then on the date to delete getting rid of it. Or maybe you can have a meta data table where you can put candidates for deletion at the time you know that you have stopped using something and send a report around to everyone once a month or so to determine if anyone else needs it.
I can't think of any easy way to do this, it's just a matter of identifying what might not be used and slogging through.
For SQL Server only, 3 options that I can think of:
modify the stored procs to log usage
check if code has no permissions set
run profiler
And of course, remove access or delete it and see who calls...

Obfuscate a SQL Server Db schema

When posting example code or filing bug reports based on a real production app, it would be helpful to have some way to change the table and column names to not potentially give away information about the internals of the app. Doing it by hand without breaking things is time consuming. Does anything automatic exist? Ideally it would use real English words so they are more easily referred to than random text strings.
As long as you don't use real data, I don't see what the issue is. Most apps are fairly obvious based on the requirements. ie CRM system = (customer name, address, etc...) or (customer name, addressid, etc.. with some address table with parts of the address, etc...). By knowing your schema I have no idea how you implement your app. Generally without the stored procedures/program code it would be hard to steal any intellectual property. Even if you were the NSA or something (InternetIP, PacketHeadingID, PacketDetailID, TimeStampID). Even with the structure of the tables I still would have no information on how your system to log all the internet traffic actually works. I also wouldn't know anything that is logged.
I don't know of anything off hand to do what you are requesting, but I would think it is fairly easy to write a script to do it on your own. Look at the table columns and datatypes and call text columns "TextColumn1", int columns "IntColumn2", etc. and build a table of substitutions, then perform the substitutions globally in the script file. I would think this is a fairly easy Python/Perl/PowerShell/Ruby/VbScript program.
I agree that there's no real need to do so, but if you feel that way, take a look at anonymizers, usually used to protect the data and not the schemas, but you could easily apply those approaches to schemas as well.
See this paper (which is the description of this framework) especially page 8 an onwards for different anonymization methods, although replacing column names for static strings might probably be good enough anyway.

Resources